CN107665356A - A kind of image labeling method - Google Patents

A kind of image labeling method Download PDF

Info

Publication number
CN107665356A
CN107665356A CN201710969648.3A CN201710969648A CN107665356A CN 107665356 A CN107665356 A CN 107665356A CN 201710969648 A CN201710969648 A CN 201710969648A CN 107665356 A CN107665356 A CN 107665356A
Authority
CN
China
Prior art keywords
image
labeling method
word
image labeling
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710969648.3A
Other languages
Chinese (zh)
Inventor
吕学强
董志安
李宝安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201710969648.3A priority Critical patent/CN107665356A/en
Publication of CN107665356A publication Critical patent/CN107665356A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The present invention relates to a kind of image labeling method, comprise the following steps:Step 1) defines the object function of image labeling model;Image is inputted CNN models by step 2), obtains primitive image features;Step 3) is weighted to primitive image features;Step 4) inputs information into LSTM models;Step 5) carries out backpropagation to error caused by prediction result.Image labeling method provided by the invention, first characteristics of the underlying image is extracted by convolutional neural networks, then the image specific location area characteristics of image related to image labeling word is extracted using focus mechanism to be input in shot and long term memory network model, the corresponding prediction mark word of generation, finally realize image labeling, excellent performance is marked, mark precision is high, can meet the needs of practical application well.

Description

A kind of image labeling method
Technical field
The invention belongs to technical field of image processing, and in particular to a kind of image labeling method.
Background technology
In recent years, researcher is directed to studying semantic understanding of the computer to image always.Automatic image annotation is to allow Computer marks keyword to the entity in image automatically, and it is a kind of key technology in field of image search.With more matchmakers Body information technology and Internet information technique are developed rapidly, and new images hundreds of millions of daily are presented on the internet.And text Originally compare, image can more directly perceived, more accurately description information, therefore can make in the epoch of nowadays information explosion, image User is more convenient, it is faster, more accurately obtain information needed.Image information is increasingly becoming the most heavy of epoch information propagation instantly One of approach wanted.Therefore, how in the view data of such magnanimity help user quickly and accurately find out needed for image into For the study hotspot in multimedia information technique field in recent years.Automatic image annotation technology is as the pass in field of image search One of key technology, turn into the important topic of numerous researcher's researchs.
Automatic image annotation has higher Research Significance and commercial value as the important technology in field of image search. Since 2000 are suggested, numerous researchers throw oneself into correlative study automatic image annotation technology, many automated graphics Mask method is suggested, although these methods improve the accuracy and efficiency of image retrieval to a certain extent.But due to The presence of image " semantic gap ", the current searching system accuracy rate based on automatic image annotation technology are still not enough managed Think, automatic image annotation technology is still in developing stage, and it is prior art that mark performance is not good enough, it is not high enough to mark precision Defect.Nowadays image information has become the important channel of transmission on Internet information.At present, global scale maximum image point Platform Flicker user is enjoyed close to 1,000,000,000, includes over ten billion image.Can in so huge image library fast accurate The image of user's request is retrieved, is the active demand in nowadays big data epoch, and current most of automatic image annotation technologies Prevalent effects are poor in so huge image library, so the new automatic image annotation technical meaning weight under research big data Greatly.
The content of the invention
For above-mentioned problems of the prior art, it is an object of the invention to provide one kind mark excellent performance, mark Note the high image labeling method of precision.
In order to realize foregoing invention purpose, technical scheme provided by the invention is as follows:
A kind of image labeling method, comprises the following steps:
Step 1) defines the object function of image labeling model;
Image is inputted CNN models by step 2), obtains primitive image features;
Step 3) is weighted to primitive image features;
Step 4) inputs information into LSTM models;
Step 5) carries out backpropagation to error caused by prediction result.
Further, the object function in step 1) isWherein y={ y1..., yN,θ represent in the model training in need parameter, I representative images;Y represents the mark group finally predicted Close, i.e., final mark word, K represents the quantity of vocabulary in vocabulary, and N represents the number of mark word.
Further, the characteristic pattern of certain layer of convolutional layer before the primitive image features in step 2) are the full articulamentums of CNN, The primitive image features are made up of L D dimensional feature, and each D dimensional features are mapped to the diverse location region of original image.
Further, step 3) is using focusing weight vectors αtPrimitive image features are weighted, focus on weight Vectorial αtIt is a L dimensional vector, the weight size of image diverse location feature is represent per one-dimensional value size,
Focus on weight vectors αt=softmax (Weet), wherein,
etThe intermediateness information of t focus mechanism is represented, a represents original Beginning characteristics of image, ht-1Represent the output of t-1 moment LSTM models.
Further, in step 4), LSTM input information xt=[Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, Wz For characteristics of image coding parameter, wherein yt-1It is the correct mark word of image, ztIt is to be weighted using weight parameter is focused at current time Characteristics of image afterwards.
Further, the correct mark phrase Y=(y of image0, y1, y2…yt…yn) defeated in order since the t=1 moment Enter among LSTM models, wherein y0It is a special word " start ", indicates the beginning of annotation process, ynIt is another Special words " end ", indicate the end of annotation process;yt-1Through term vector coding parameter WyLSTM models are input to after coding In;ztThrough characteristics of image coding parameter WzIt is input to after coding in LSTM models.
Further, correctly mark word uses one-hot coding form, is made up of a N-dimensional vector, N is represented in word lexicon Number of words, in addition to corresponding mark lexeme is 1, remaining position is 0.
Further, all predictions are marked word using loss function and mark correct log likelihood probabilities by step 5) Negative is taken after value summation, the loss function is defined as
Further, step 5) also includes constantly updating in model using stochastic gradient descent method and chain type Rule for derivation Parameter.
Further, the calculating process formula of LSTM models is as follows:
it=σ (Wixxt+Wihht-1),
ot=σ (Woxxt+Wohht-1),
ft=σ (Wfxxt+Wfhht-1),
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1),
ht=ot⊙ct,
yt+1=Softmax (Wyht),
Wherein, σ (), h () are activation primitives, and ⊙ is the operation of matrix dot product;itIt is input threshold, during for controlling t The input information at quarter;ftIt is to forget thresholding, for controlling the selective amnesia of the recall info to t-1 moment hidden layers;otIt is defeated Go out thresholding, for controlling the output information of t;ctThe recall info of t hidden layer, it by last moment hidden layer Information and the input information at current time together decide on, and it is LSTM core cell;htIt is the output of t hidden layer Information;yt+1It is htThe prediction result obtained by softmax graders.
Image labeling method provided by the invention, it is existing between characteristics of the underlying image and high-level semantic for effectively alleviating Semantic gap problem, proposes a kind of deep neural network image labeling method based on focus mechanism, and this method passes through volume first Product neutral net (CNN) extraction characteristics of the underlying image, then extracts image specific location area and image mark using focus mechanism The related characteristics of image of note word is input in shot and long term memory network (LSTM) model, generates corresponding prediction mark word, finally Realize image labeling;This method effectively combines ability and LSTM extraction figures that CNN extracts characteristics of image by focus mechanism As the ability of semantic feature, characteristics of the underlying image and image high-level semantics features can be utilized, preferably can extract and scheme As semantic related characteristics of image, image labeling precision is effectively improved, marks excellent performance, mark precision is high, can be well Meet the needs of practical application.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the structural representation of the deep neural network image labeling model based on focus mechanism;
Fig. 3 is traditional neural network model basic structure schematic diagram;
Fig. 4 is RNN neural network model conventional structure schematic diagrames;
Fig. 5 is LSTM NE internal structure schematic diagrams.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with the accompanying drawings and specific implementation The present invention will be further described for example.It should be appreciated that specific embodiment described herein is only to explain the present invention, and do not have to It is of the invention in limiting.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, a kind of image labeling method, comprises the following steps:
Step 1) establishes the deep neural network image labeling model based on focus mechanism, defines the target letter of the model Number:
Wherein, y={ y1..., yN,θ represent in the model training in need parameter, I represents one Image;Y represents the mark combination finally predicted, i.e., final mark word, and K represents the quantity of vocabulary in vocabulary, and N represents mark The number of word.Shown in the structure reference picture 2 of image labeling model.
Image I is inputted CNN models by step 2), obtains primitive image features a,
It is that the feature of image diverse location is weighted in view of focus mechanism, therefore the primitive character extracted should include Positional information, the characteristic pattern of each layer and the mapping relations in original image existence position before full articulamentum in CNN models.The present invention The characteristic pattern of certain layer of convolutional layer before the selection full articulamentums of CNN is as primitive image features, and the primitive image features are by L D Dimensional feature forms, and each D dimensional features are mapped to the diverse location region of original image.
Step 3) is weighted using weight vectors are focused on to primitive image features;
Focus mechanism is realized at different moments to the attention rate of diverse location provincial characteristics, this pass to diverse location Note is by focusing on weight αtTo control.As shown in Fig. 2 since the t=1 moment, each moment, the model can produce one Focus on weight vectors αt.Focus on weight vectors αtIt is a L dimensional vector, its vectorial all elements sum is 1, i.e.,It is each The value size of dimension represents the weight size of image diverse location feature.Shown in its calculation formula such as formula (4.4), (4.5),
αt=softmax (Weet) (4.5);
Wherein etRepresent the intermediateness information of t focus mechanism, as t=0, e0Obtained from characteristics of image a.Work as t During > 0, etBy the output h of t-1 moment LSTM modelst-1With the intermediateness information e of t-1 moment focus mechanismst-1Together decide on. et-1The memory module in focus mechanism model is can be understood as, all moment are for picture position region before it remembers t Concern information.This process can be needed by before by the picture position region to be paid close attention to of determination current time of intuitivism apprehension Moment picture position area information of interest is (by et-1There is provided) and the moment remembers the semantic information in LSTM models before (by ht-1There is provided).αtBy etLine focus weight decoding parametric WeObtained again by softmax graders after decoding.Focus mechanism mould The focusing weight α that type obtains when training and startingtCharacteristics of image can not be focused on accurately to current time prediction mark word Present position in the picture, i.e. application focus on weight αtCharacteristics of image after obtained weighting and to focus on current time exactly pre- The weighted image of mark note word is characterized in gap being present.As training process is carried out, the parameter W in focus mechanism modela、Wh、 WeIt is continuously updated, this gap is also just steadily decreasing, and final focus mechanism model can realize accurate focusing.
The characteristics of image being input to after weighting in LSTM modelsFor the z of ttBy primitive image features A and t focus on weight αtMultiplication obtains, for controlling attention rate of the t for image diverse location feature.To t It is input to the image weighted feature z of LSTM modelstThe position focused on is exactly the prediction mark word of the output of t LSTM models Location.
Step 4) inputs information into LSTM models, and the characteristics of image after the correct mark phrase of image and weighting is defeated Enter in LSTM models;
LSTM input information xt=[Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, WzFor characteristics of image coding parameter, xtIt is made up of two parts, wherein yt-1It is that the correct mark word of image (uses one-hot coding form, is made up of a N-dimensional vector, N Represent the number of words in word lexicon.In addition to corresponding mark lexeme is 1,0) remaining position is.The correct mark phrase Y=of image (y0, y1, y2…yt...yn) be successively inputted to since the t=1 moment among LSTM models.Wherein y0It is a special list Word " start ", indicates the beginning of annotation process.ynIt is another special words " end ", indicates the end of annotation process. yt-1Through term vector coding parameter WyIt is input to after coding in LSTM models.xtAnother part be current time using focusing on weight Characteristics of image z after parameter weightingt, ztThrough characteristics of image coding parameter WzIt is input to after coding in LSTM models.
The output information h of each moment LSTM model hidden layertBy exporting decoding parametric WpPrediction result is obtained after decoding pt+1, pt+1=g (Wp·ht+bp), wherein g () represents softmax graders.pt+1Be using LSTM models obtain it is current when Carve the prediction probability of next mark word of LSTM mode inputs mark word.But pass through pt+1Obtained prediction marks word and worked as There is gap in next correct mark word of preceding moment LSTM mode inputs mark word, i.e. prediction result generates error. Need to carry out backpropagation to this error, to ensure with the training of model, the prediction result at LSTM models each moment with Correct prediction result gap is less and less, finally gives the higher image labeling model of precision.
Step 5) carries out backpropagation to error caused by prediction result, and all predictions are marked into word marks correct log Negative is taken after the summation of likelihood probability value;
This model training process is the backpropagation of error, updates the process of model parameter, defines loss functionThe loss function is that all predictions are marked into the correct log likelihood probabilities value of word mark to ask The result of negative is taken with after.
Updated for parameter, using stochastic gradient descent method (Stochastic Gradient Descent, abbreviation SGD) And chain type Rule for derivation.By training the parameter constantly updated in model so that penalty values L (I, y) is as far as possible small.These parameters Including LSTM model inner parameters, weight parameter (W is focused ona、Wh、We), Chinese word coding parameter Wy, characteristics of image coding parameter Wz, it is defeated Go out decoding parametric WpDeng (present invention is direct using the CNN model extraction characteristics of image trained, therefore not to CNN model parameters It is updated), above-mentioned parameter is parameter sharing at each moment of model training.
CNN is one kind of feedforward neural network, and it includes two kinds of unique hidden layer structure of convolutional layer and pond layer.CNN With preferable ability in feature extraction, it is widely used in the fields such as image, video, voice at present.
CNN has unique network structure.Its uniqueness is mainly reflected in two aspects.Its next layer during one side Do not connected entirely between neuron and last layer neuron, i.e., be local sensing between its neuron.On the other hand nerve There is identical weight, i.e. the connection of neuron is that weight is shared in first connection procedure.This unique local sensing and power Shared network structure approaches with biological neural network again.Such model can effectively reduce the parameter in network, effectively drop The complexity of low network.CNN has two kinds of unique hidden layer structures, i.e. convolutional layer and pond layer.In CNN a certain layer convolutional layer by A variety of convolution kernel compositions, a convolution kernel is the wave filter of a M*M size, and it is used for extracting each office in last layer receptive field Certain local feature of portion position.Pond layer is used for carrying out dimensionality reduction to last layer convolution feature, and concrete operations are to roll up last layer Product feature is divided into multiple N*N region.The characteristic value of average (or maximum) in each region is extracted as the feature after dimensionality reduction. CNN generally would generally access a softmax grader after a series of convolutional layers, pond layer, full articulamentum, for locating Manage more classification problems.
Recognition with Recurrent Neural Network (Recurrent Neural Network, hereinafter referred to as RNN) has unique memory function Structure.Neural network model includes input layer, hidden layer, output layer three-decker.In traditional neural network model, from input Layer is connectionless per interior nodes from level to level, the node between each layer is in the presence of connection, specific knot to hidden layer to output layer Structure is as shown in Figure 3.This traditional neural network model and the function not comprising recall info, as some needs are by having produced The problem of information is calculated is helpless.If for example, in short, to predict the word of next appearance, greatly Needed in the case of more by above caused vocabulary, such as " I is a basket baller, and I likes to play basketball " such one Word, " playing basketball " inside latter sentence need to be inferred to by " basket baller " in last sentence.RNN models can be with Information caused by the moment it will be remembered and be applied in current time calculating process before, this has benefited from RNN compared to tradition The change that occurs in structure of neural network model, the input of RNN hidden layer is not only defeated comprising current time input layer Go out, also include the output information of last moment hidden layer, i.e., the node inside hidden layer has connection, and concrete structure information is such as Shown in Fig. 4.
LSTM (Long Short-Term Memory, LSTM) is the improved model of RNN models, its internal cellular construction As shown in Figure 5.Shown in calculating process formula (3.1)-(3.6) of LSTM models.Wherein, σ (), h () are activation primitives, ⊙ It is the operation of matrix dot product.itIt is input threshold, for controlling the input information of t.ftIt is to forget thresholding, for controlling to t-1 The selective amnesia of the recall info of moment hidden layer.otIt is output thresholding, for controlling the output information of t.ctWhen being t The recall info of hidden layer is carved, it is together decided on by the hidden layer information of last moment and the input information at current time, and it is LSTM core cell.htIt is the output information of t hidden layer.yt+1It is htObtained by softmax graders pre- Survey result.
it=σ (Wixxt+Wihht-1) (3.1);
ot=σ (Woxxt+Wohht-1) (3.2);
ft=σ (Wfxxt+Wfhht-1) (3.3);
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1) (3.4);
ht=ot⊙ct(3.5);
yt+1=Softmax (Wyht) (3.6)。
Image labeling method provided by the invention, it is existing between characteristics of the underlying image and high-level semantic for effectively alleviating Semantic gap problem, proposes a kind of deep neural network image labeling method based on focus mechanism, and this method passes through volume first Product neutral net (CNN) extraction characteristics of the underlying image, then extracts image specific location area and image mark using focus mechanism The related characteristics of image of note word is input in shot and long term memory network (LSTM) model, generates corresponding prediction mark word, finally Realize image labeling;The new distance measure of this image labeling method, has merged the semantic information of image, reduces image bottom Difference between feature and image high-level semantic, it is significant for image, semantic accurate understanding, image can be effectively improved Mark precision;For this method by focus mechanism, the ability and LSTM for effectively combining CNN extraction characteristics of image extract image language The ability of adopted feature, characteristics of the underlying image and image high-level semantics features can be utilized, can preferably be extracted and image language Adopted related characteristics of image, effectively improves image labeling precision, marks excellent performance, mark precision is high, can meet well The needs of practical application.
Embodiment described above only expresses embodiments of the present invention, and its description is more specific and detailed, but can not Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention Enclose.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of image labeling method, it is characterised in that comprise the following steps:
Step 1) defines the object function of image labeling model;
Image is inputted CNN models by step 2), obtains primitive image features;
Step 3) is weighted to primitive image features;
Step 4) inputs information into LSTM models;
Step 5) carries out backpropagation to error caused by prediction result.
2. image labeling method according to claim 1, it is characterised in that the object function in step 1) is
3. according to the image labeling method described in claim 1-2, it is characterised in that the primitive image features in step 2 are CNN The characteristic pattern of certain layer of convolutional layer before full articulamentum, the primitive image features are made up of L D dimensional feature, and each D dimensional features reflect It is mapped to the diverse location region of original image.
4. according to the image labeling method described in claim 1-3, it is characterised in that step 3 is using focusing weight vectors αt Primitive image features are weighted, focus on weight vectors αtIt is a L dimensional vector, image is represent not per one-dimensional value size With the weight size of position feature.
Focus on weight vectors αt=softmax (Weet), wherein,
etThe intermediateness information of t focus mechanism is represented, a represents original graph As feature, ht-1Represent the output of t-1 moment LSTM models.
5. according to the image labeling method described in claim 1-4, it is characterised in that in step 4), LSTM input information xt= [Wyyt-1, Wzzt], wherein WyFor Chinese word coding parameter, WzFor characteristics of image coding parameter, wherein yt-1It is the correct mark word of image, ztIt is to use the characteristics of image after focusing on weight parameter weighting at current time.
6. according to the image labeling method described in claim 1-5, it is characterised in that the correct mark phrase Y=(y of image0, y1, y2...yt...yn) be successively inputted to since the t=1 moment among LSTM models, wherein y0It is a special word " start ", indicate the beginning of annotation process, ynIt is another special words " end ", indicates the end of annotation process;yt-1 Through term vector coding parameter WyIt is input to after coding in LSTM models;ztThrough characteristics of image coding parameter WzIt is input to after coding In LSTM models.
7. according to the image labeling method described in claim 1-5, it is characterised in that correct mark word uses one-hot coding shape Formula, it is made up of a N-dimensional vector, N represents the number of words in word lexicon, and in addition to corresponding mark lexeme is 1, remaining position is 0.
8. according to the image labeling method described in claim 1-7, it is characterised in that step 5) is using loss function by institute There is prediction mark word to take negative after marking correct log likelihood probabilities value summation, the loss function is defined as
<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>I</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>log</mi> <mi> </mi> <msub> <mi>p</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow>
9. according to the image labeling method described in claim 1-8, it is characterised in that step 5) also includes applying under stochastic gradient Drop method and chain type Rule for derivation constantly update the parameter in model.
10. according to the image labeling method described in claim 1-9, it is characterised in that the calculating process formula of LSTM models is such as Under:
it=σ (Wixxt+Wihht-1),
ot=σ (Woxxt+Wohht-1),
ft=σ (Wfxxt+Wfhht-1),
ct=ft⊙ct-1+it⊙h(Wcxxt+Wchht-1),
ht=ot⊙ct,
yt+1=Softmax (Wyht)。
CN201710969648.3A 2017-10-18 2017-10-18 A kind of image labeling method Pending CN107665356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710969648.3A CN107665356A (en) 2017-10-18 2017-10-18 A kind of image labeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710969648.3A CN107665356A (en) 2017-10-18 2017-10-18 A kind of image labeling method

Publications (1)

Publication Number Publication Date
CN107665356A true CN107665356A (en) 2018-02-06

Family

ID=61098761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710969648.3A Pending CN107665356A (en) 2017-10-18 2017-10-18 A kind of image labeling method

Country Status (1)

Country Link
CN (1) CN107665356A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665506A (en) * 2018-05-10 2018-10-16 腾讯科技(深圳)有限公司 Image processing method, device, computer storage media and server
CN109032356A (en) * 2018-07-27 2018-12-18 深圳绿米联创科技有限公司 Sign language control method, apparatus and system
CN109146858A (en) * 2018-08-03 2019-01-04 诚亿电子(嘉兴)有限公司 The secondary method of calibration of automatic optical inspection device problem
CN109343920A (en) * 2018-09-10 2019-02-15 深圳市腾讯网络信息技术有限公司 A kind of image processing method and its device, equipment and storage medium
WO2020186484A1 (en) * 2019-03-20 2020-09-24 深圳大学 Automatic image description generation method and system, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
CN105701516A (en) * 2016-01-20 2016-06-22 福州大学 Method for automatically marking image on the basis of attribute discrimination
CN105938485A (en) * 2016-04-14 2016-09-14 北京工业大学 Image description method based on convolution cyclic hybrid model
CN107066464A (en) * 2016-01-13 2017-08-18 奥多比公司 Semantic Natural Language Vector Space
CN107076567A (en) * 2015-05-21 2017-08-18 百度(美国)有限责任公司 Multilingual image question and answer

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107076567A (en) * 2015-05-21 2017-08-18 百度(美国)有限责任公司 Multilingual image question and answer
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
CN107066464A (en) * 2016-01-13 2017-08-18 奥多比公司 Semantic Natural Language Vector Space
CN105701516A (en) * 2016-01-20 2016-06-22 福州大学 Method for automatically marking image on the basis of attribute discrimination
CN105938485A (en) * 2016-04-14 2016-09-14 北京工业大学 Image description method based on convolution cyclic hybrid model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KELVIN XU 等: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1502.03044》 *
ORIOL VINYALS 等: "Show and Tell: A Neural Image Caption Generator", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
ZHAO GUO 等: "Attention-based LSTM with Semantic Consistency for Videos Captioning", 《PROCEEDINGS OF THE 24TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 *
梁欢: "《基于深度学习的图像语义理解研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665506A (en) * 2018-05-10 2018-10-16 腾讯科技(深圳)有限公司 Image processing method, device, computer storage media and server
CN108665506B (en) * 2018-05-10 2021-09-28 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer storage medium and server
CN109032356A (en) * 2018-07-27 2018-12-18 深圳绿米联创科技有限公司 Sign language control method, apparatus and system
CN109146858A (en) * 2018-08-03 2019-01-04 诚亿电子(嘉兴)有限公司 The secondary method of calibration of automatic optical inspection device problem
CN109146858B (en) * 2018-08-03 2021-09-17 诚亿电子(嘉兴)有限公司 Secondary checking method for problem points of automatic optical checking equipment
CN109343920A (en) * 2018-09-10 2019-02-15 深圳市腾讯网络信息技术有限公司 A kind of image processing method and its device, equipment and storage medium
CN109343920B (en) * 2018-09-10 2021-09-07 深圳市腾讯网络信息技术有限公司 Image processing method and device, equipment and storage medium thereof
WO2020186484A1 (en) * 2019-03-20 2020-09-24 深圳大学 Automatic image description generation method and system, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN110609891A (en) Visual dialog generation method based on context awareness graph neural network
CN107665356A (en) A kind of image labeling method
CN110390397B (en) Text inclusion recognition method and device
CN110134946B (en) Machine reading understanding method for complex data
CN108628935B (en) Question-answering method based on end-to-end memory network
CN111291556B (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN106855853A (en) Entity relation extraction system based on deep neural network
CN108875074A (en) Based on answer selection method, device and the electronic equipment for intersecting attention neural network
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN110096567A (en) Selection method, system are replied in more wheels dialogue based on QA Analysis of Knowledge Bases Reasoning
CN109597876A (en) A kind of more wheels dialogue answer preference pattern and its method based on intensified learning
CN112699216A (en) End-to-end language model pre-training method, system, device and storage medium
CN109214006A (en) The natural language inference method that the hierarchical semantic of image enhancement indicates
CN113743099B (en) System, method, medium and terminal for extracting terms based on self-attention mechanism
CN114428850B (en) Text retrieval matching method and system
CN112200664A (en) Repayment prediction method based on ERNIE model and DCNN model
CN114969278A (en) Knowledge enhancement graph neural network-based text question-answering model
CN112215017A (en) Mongolian Chinese machine translation method based on pseudo parallel corpus construction
CN114510946B (en) Deep neural network-based Chinese named entity recognition method and system
CN115964459B (en) Multi-hop reasoning question-answering method and system based on food safety cognition spectrum
CN115688753A (en) Knowledge injection method and interaction system of Chinese pre-training language model
CN112417890B (en) Fine granularity entity classification method based on diversified semantic attention model
Li et al. Multimodal fusion with co-attention mechanism
Xu et al. CNN-based skip-gram method for improving classification accuracy of chinese text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180206