CN110633713A - Image feature extraction method based on improved LSTM - Google Patents

Image feature extraction method based on improved LSTM Download PDF

Info

Publication number
CN110633713A
CN110633713A CN201910889843.4A CN201910889843A CN110633713A CN 110633713 A CN110633713 A CN 110633713A CN 201910889843 A CN201910889843 A CN 201910889843A CN 110633713 A CN110633713 A CN 110633713A
Authority
CN
China
Prior art keywords
lstm
output
gate
current
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910889843.4A
Other languages
Chinese (zh)
Inventor
李建平
顾小丰
胡健
赖志龙
苌浩阳
蒋胜
冯文婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910889843.4A priority Critical patent/CN110633713A/en
Publication of CN110633713A publication Critical patent/CN110633713A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image feature extraction method based on improved LSTM, which comprises the following steps of S1: inputting an original image into a convolutional neural network, and extracting a corresponding feature vector; s2, acquiring image characteristics: and inputting the extracted feature vector into a trained LSTM model to obtain image features. After the method obtains the feature vector of the original image, the problem that the existing LSTM model does not conform to the context when generating a new word is considered, and feature extraction is carried out on the LSTM model with the above information by constructing the LSTM model in the decoding stage, so that the accuracy of image feature extraction is improved.

Description

Image feature extraction method based on improved LSTM
Technical Field
The invention belongs to the technical field of image feature extraction, and particularly relates to an image feature extraction method based on improved LSTM.
Background
The image features are used to describe image information, and the image features in physical sense generally include shapes, colors, textures, spatial relationships, and the like. The shape of the image generally refers to an outline shape and a region shape, wherein the outline shape represents an embodied edge shape and represents an external shape of the whole image, and the region feature represents a shape inside the image. The color feature is a global feature, is the most obvious and most noticeable surface characteristic of the image, and is represented based on pixel points. Like the color feature, the texture feature is also a global feature and also represents the surface characteristics of the object, but the texture feature is calculated in a plurality of pixel point regions. The discussion objects of the image space relation features are a plurality of entities in the image and are divided into relative space positions and absolute space positions, wherein the former emphasizes relative relation, and the latter emphasizes distance and coordinate orientation.
At present, the application of extracting image features by adopting a convolutional neural network method is very common, and good effect is achieved; the convolutional neural network belongs to an encoding stage in an image automatic description task, and the most commonly used model in a decoding stage is the recurrent neural network and its variants, such as standard RNN, LSTM, GRU, etc., where LSTM has the most widely used function of long-distance memory. One of the most successful deformation models for RNN is LSTM, which inherits most of the properties of RNN, and LSTM solves the problem of gradient attenuation generated by RNN during gradient back propagation. In natural language processing, LSTM is particularly good at handling sequence related tasks such as dialog systems, machine translation, image description, and the like; although the feedforward neural network represented by the convolutional neural network still has absolute advantages of performance and effect on classification tasks, the recurrent neural network cannot be compared with the convolutional neural network on sequence processing tasks, and the LSTM more vividly expresses and simulates the processes of human behavior characteristics, logical thinking and cognition.
Analysis of the existing LSTM reveals that when a new word is generated, the new word is most affected by a word next to the new word at the sentence level, and other words have little influence on the new word, which is not in accordance with the context, so that the existing LSTM is difficult to extract accurate image features.
Disclosure of Invention
Aiming at the defects in the prior art, the image feature extraction method based on the improved LSTM solves the problem that context information is difficult to be considered when the LSTM is adopted for decoding in the existing image feature extraction process, so that the extracted features are inaccurate.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: an improved LSTM-based image feature extraction method comprises the following steps:
s1, extracting feature vectors: inputting an original image into a convolutional neural network, and extracting a corresponding feature vector;
s2, acquiring image characteristics: and inputting the extracted feature vector into a trained LSTM model to obtain image features.
Further, the size of the original image in the step S1 is 128 × 128;
the convolutional neural network has a 5-layer network structure;
the number of feature vectors extracted by the convolutional neural network is 1.
Further, the convolutional neural network in step S1 includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, and a fully-connected layer, which are connected in sequence;
the first convolution layer inputs 128 x 128 original images;
the first convolution layer comprises 8 convolution kernels with the size of 5 x 5 and outputs 8 feature maps with the size of 64 x 64;
the second convolution layer comprises 16 convolution kernels with the size of 4 x 4 and outputs 16 characteristic maps of 32 x 32;
the third convolutional layer comprises 32 convolutional kernels with the size of 3 × 3, and 32 16 × 16 feature maps are output;
the fourth convolution layer comprises 64 convolution kernels with the size of 2 x 2 and outputs 64 characteristic maps of 16 x 16;
and the full connection layer connects all the characteristic graphs output by the fourth convolution layer and outputs 1 characteristic vector.
Further, the LSTM model in step S2 includes a plurality of sequentially connected LSTM units;
in the data flow of the LSTM model:
the output end of the previous LSTM unit outputs the word generated by the LSTM unit and inputs the word into the current LSTM unit; the word generated by the previous LSTM unit and the word generated by the current LSTM unit are subjected to vector dot multiplication to be used as the above information, and the above information is input into the next LSTM unit.
Furthermore, each LSTM unit comprises a forgetting gate, an input gate and an output gate which are connected in sequence;
the forgetting gate is used for determining the information which needs to be discarded by the LSTM unit;
the input gate is used for determining the quantity of information input into the current LSTM unit and updating the state and the information of the current LSTM unit;
the output gate is used for determining the state and the hidden layer state which need to be output by the current LSTM unit.
Further, the forgetting gate comprises a sigmoid function and a vector dot product;
in the data flow direction of the forgetting gate;
hidden layer state h output by previous LSTM unitt-1And current LSTM cell input xtObtaining information f in the current LSTM unit after processing through sigmoid functiontA 1 is to ftState C output from the previous LSTM cellt-1The information obtained by vector dot multiplication that needs to be discarded is input into the input gate.
Further, the input gate comprises a sigmoid function, a tanh function, a vector point multiplication and a vector accumulation;
in the data flow direction of the input gate;
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltObtaining information i needing to be updated in the current LSTM unit after processing through sigmoid functiontHidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltUpdating the state of the current LSTM cell to be
Figure BDA0002208371700000041
Will itAnd
Figure BDA0002208371700000042
and after vector dot multiplication, performing vector accumulation with the output of the forgetting gate, and inputting an accumulation result into an output gate.
Further, the output gate comprises a sigmoid function, a tanh function and a vector dot product;
in the data flow direction of the output gate;
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltCarrying out vector point multiplication after sigmoid function processing to obtain a hidden layer state h required to be output by the current LSTM unittMeanwhile, the result of the vector dot product is processed by the tanh function, and the output result of the input gate is combined to determine the state C which needs to be output by the current LSTM unitt
Further, in step S2, the optimization function during the LSTM model training is an SGD optimizer;
during the LSTM model training process:
setting the initial learning rate to be 5e-4, and reducing the learning rate to be 0.8 time of the original learning rate every iteration for 4 times; while the batch size is set to 48 and the maximum number of iterations is set to 50.
The invention has the beneficial effects that:
according to the image feature method based on the improved LSTM, after the original image feature vector is obtained, the problem that the existing LSTM model does not conform to the context when a new word is generated is considered, and the LSTM model with the above information is constructed in the decoding stage to perform feature extraction, so that the accuracy of image feature extraction is improved.
Drawings
Fig. 1 is a flow chart of the image feature extraction method based on the improved LSTM provided by the present invention.
FIG. 2 is a diagram of the LSTM model structure in the present invention.
Fig. 3 is a diagram showing the structure of an LSTM unit in the present invention.
Fig. 4 is a model test picture in an embodiment provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, an improved LSTM-based image feature extraction method includes the following steps:
s1, extracting feature vectors: inputting an original image into a convolutional neural network, and extracting a corresponding feature vector;
s2, acquiring image characteristics: and inputting the extracted feature vector into a trained LSTM model to obtain image features.
The size of the original image in the above-described step S1 is 128 × 128.
The receptive field in biology is the attributes of visual sense, auditory sense and the like in a nervous system, the received stimulation range is fixed, and a convolutional neural network is simulated according to the thought and has two characteristics of local part and weight sharing. Generally, a convolutional neural network mainly completes two tasks of a feature extraction layer and feature mapping, wherein the feature extraction layer is mainly responsible for extracting the features of neurons connected with the previous layer, and after the features are extracted, the mapping relation between the neurons is determined, namely local connection; the feature mapping layer is a geometric plane, all weights on the geometric plane are equal, namely the weights are shared, and a general feature mapping layer adopts Sigmoid as an activation function, and the function enables mapping between features to have displacement invariance. The weight sharing of the convolutional neural network makes it have fewer parameters and training is faster.
Therefore, the convolutional neural network for extracting the original image features comprises a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a full-connection layer which are sequentially connected;
inputting a 128 x 128 original image into a first convolution layer;
the first convolution layer comprises 8 convolution kernels with the size of 5 multiplied by 5, and 8 feature maps with the size of 64 multiplied by 64 are output;
the second convolution layer comprises 16 convolution kernels with the size of 4 x 4 and outputs 16 characteristic maps of 32 x 32;
the third convolution layer comprises 32 convolution kernels with the size of 3 x 3 and outputs 32 characteristic maps with the size of 16 x 16;
the fourth convolution layer comprises 64 convolution kernels with the size of 2 x 2 and outputs 64 characteristic maps of 16 x 16;
the full-connection layer connects all the feature maps output by the fourth convolution layer and outputs 1 feature vector.
From the perspective of natural language, regardless of the language type, including english and chinese, each word in a sentence is more or less affected by its context words. Analysis of an LSTM reveals that when the LETM model produces a new word, the new word is most affected at the sentence level by the word immediately above the new word, while other words have little effect on the new word, which is not contextually relevant.
Based on the context of the words in the sentence and the defect consideration of the LSTM in generating the words, the invention improves the LSTM model structure, and in the process of generating a new word by the LSTM, the influence of almost only one injured word is changed into the direct influence of all the words in the context; it should be noted that, since the formula of the sentence is from left to right, and there is only the above text and no text for a new word, the context in the improved LSTM model structure of the present invention refers to the above information. Therefore, the LSTM model provided by the present invention, as shown in fig. 2, includes a plurality of LSTM units connected in sequence;
in the data flow of the LSTM model:
the output end of the previous LSTM unit outputs the word generated by the LSTM unit and inputs the word into the current LSTM unit; the word generated by the previous LSTM unit and the word generated by the current LSTM unit are subjected to vector dot multiplication to be used as the above information, and the above information is input into the next LSTM unit.
Specifically, as shown in fig. 3, each LSTM unit includes a forgetting gate, an input gate, and an output gate, which are connected in sequence;
the forgetting gate is used for determining the information which needs to be discarded by the LSTM unit;
the input gate is used for determining the quantity of information input into the current LSTM unit and updating the state and the information of the current LSTM unit;
the output gate is used for determining the state and hidden state required to be output by the current LSTM unit.
Wherein the forgetting gate comprises a sigmoid function and a vector dot product;
in the data flow direction of the forgetting gate;
hidden layer state h output by previous LSTM unitt-1And current LSTM cell input xtObtaining information f in the current LSTM unit after processing through sigmoid functiontA 1 is to ftState C output from the previous LSTM cellt-1The information which is obtained by vector dot multiplication and needs to be discarded is input into an input gate;
wherein f istComprises the following steps:
ft=σ(Wf*[ht-1,xt]+bf)
wherein σ (·) is a sigmoid function;
Wffor information f in the current LSTM celltThe weight of (c);
bffor information f in the current LSTM celltBias of (3);
the input gate comprises a sigmoid function, a tanh function, a vector point multiplication and a vector accumulation;
in the data flow direction of the input gate;
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltObtaining information i needing to be updated in the current LSTM unit after processing through sigmoid functiontHidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltUpdating the state of the current LSTM cell to be
Figure BDA0002208371700000071
Will itAnd
Figure BDA0002208371700000072
after vector dot multiplication, carrying out vector accumulation with the output of the forgetting gate, and inputting an accumulation result into an output gate;
wherein the information i to be updatedtComprises the following steps:
it=σ(Wi*[ht-1,xt]+bi)
in the formula, WiInformation i to be updatedtThe weight of (c);
biinformation i to be updatedtBias of (3);
Figure BDA0002208371700000081
comprises the following steps:
Figure BDA0002208371700000082
in the formula, WCIs composed of
Figure BDA0002208371700000083
The weight of (c);
bCis composed of
Figure BDA0002208371700000084
Bias of (3);
the output gate comprises a sigmoid function, a tanh function and a vector dot product;
in the data flow direction of the output gate;
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltCarrying out vector point multiplication after sigmoid function processing to obtain a hidden layer state h required to be output by the current LSTM unittMeanwhile, the result of the vector dot product is processed by the tanh function, and the output result of the input gate is combined to determine the state C which needs to be output by the current LSTM unitt
Wherein, CtComprises the following steps:
Figure BDA0002208371700000085
in the formula, Ct-1Is the state of the LSTM cell before updating.
In step S2, the optimization function during the LSTM model training is the SGD optimizer;
during the LSTM model training process:
setting the initial learning rate to be 5e-4, gradually reducing the learning rate in order to relieve the oscillation phenomenon and fall into the local minimum value in the training process, wherein the learning rate is reduced to 0.8 time of the original learning rate every 4 times of iteration; and meanwhile, setting the batch size to be 48, setting the maximum iteration number to be 50, and after the training is finished, taking the LSTM model with the highest BLEU score as the final model for outputting the image characteristics. And, in order to shorten the training time and accelerate the convergence, a batch normalization operation is added.
In one embodiment of the present invention, an experimental procedure for image feature extraction by the method of the present invention is provided:
(1) selecting an image data set;
at present, the commonly used classic data sets for image English description are MSCOCO, Flickr8k, Flickr30k and the like, and the data sets for image Chinese description are AI-Changler, Flickr8k-CN and the like. Because Chinese is more complex than English in the aspects of grammar, semantics and the like, the difficulty of image description based on Chinese is higher, and therefore the invention adopts image English description. In the experiment, an MSCOCO-2015 data set is selected as experimental data, a training set comprises about 16 ten thousand pictures, a test set and a verification set respectively comprise about 8 ten thousand pictures, and each picture is provided with 5 different manually marked English description sentences. In the experiment, a training set, a test set and a verification set are constructed according to a ratio of 8:1:1, wherein 80000 pictures are in the training set, 10000 pictures are in the test set, and 10000 pictures are in the verification set.
(2) Image data pre-processing
The data needs to be preprocessed before training with the MSCOCO-2015 data set. Firstly, letter case conversion is carried out on the description sentences labeled manually, and capital letters are converted into lowercase letters, so that unified processing of data is facilitated. Secondly, punctuation marks in the description sentences have little significance to model training, even have negative influence, so all punctuation marks in the description sentences are removed. Since the length of the descriptive sentence is indefinite, the maximum length of the word sequence is set to 15 after statistical analysis of the descriptive sentence length. In constructing the vocabulary, the threshold value of the occurrence frequency of each word is set to 8, words having an occurrence frequency greater than the threshold value are added to the vocabulary, and words having an occurrence frequency less than the threshold value are replaced with the meaningless character < UNK > in the natural language processing. After the vocabulary construction is completed, the vector representation of the words is performed using the commonly used one-hot encoding.
(3) Constructing a convolution neural network with a five-layer structure, and extracting a characteristic vector in an image data set through the convolution neural network;
(4) and inputting the extracted feature vector into an LSTM model, and extracting image features.
Experimental results and analysis:
in the experiment, on a Microsoft MSCOCO-2015 data set, the proposed model and method are compared with some image description models in the industry at present, such as models of NIC, ATT-FCN, Soft-Attention, MSM and the like, and Table 1 shows the comparison effect of the model and the models.
Table 1: evaluation results of different image description models on MSCOCO data set
Figure BDA0002208371700000101
Wherein, "- -" in table 1 indicates that no test is performed, B-1, B-2, B-3, B-4, and METEOR are used as the evaluation method of the model, B-1, B-2, B-3, and B-4 respectively indicate the matching mode of the one-tuple, the two-tuple, the triple, and the four-tuple adopted in the BLEU evaluation method, and the bold font indicates the best index under the evaluation method. As can be seen from Table 1, the model of the present invention has the best effect under the B-3 evaluation compared with the other 4 models, and has good effect under the other 4 evaluation methods. The MSM model has better effect than NIC, Soft-Attention and ATT-FCN under 5 evaluation methods, so that the MSM model has very good effect of processing image description. The effect of the model of the invention under the B-3 evaluation method is better than that of the MSM model, and the effect under the METEOR evaluation method is better than that of Soft-Attention and ATT-FCN and is slightly lower than that of MSM. And the effect of the model of the invention under the evaluation of B-1, B-2, B-3 and B-4 is better than that of the NIC model which is put forward at first.
In the experiment of the invention, different iteration cycle epoch training models are tried in order to obtain better experiment effect, and table 2 shows the effect of different iteration cycles under corresponding evaluation methods.
Table 2: scoring of different iteration cycles under corresponding evaluation method
Figure BDA0002208371700000111
As can be seen from Table 2, the iteration cycle and the model effect are directly related, Bleu-1, Bleu-3 and METEOR all achieve the best results at an iteration cycle of 45, and Bleu-2 and Bleu-4 achieve the optimal values at iteration cycles of 40 and 50, respectively. It can be seen that the iteration cycle is not as large as possible, and not as small as possible, but needs to be set to an appropriate value, and experiments in the invention show that the model has the best effect when the iteration cycle is set to 45.
Finally, in order to show the accuracy and the description effect of the model, after training is completed, 4 test pictures are selected to perform a comparison test on the model and the Muti-Head model, the Muti-Head model is also an improvement based on Soft-Attention and is applied to an image description task, and the test pictures are shown in FIG. 4.
For the test picture shown in fig. 4, the description sentences obtained by the model of the present invention are shown in table 4:
table 3: model description effects of the invention
Figure BDA0002208371700000112
For the test pictures shown in FIG. 4, the description sentences obtained by the Muti-Head model are shown in Table 4:
table 4: Muti-Head model describes effects
Figure BDA0002208371700000121
As can be seen from Table 3, the description result of the model of the present invention can substantially accurately express the information of the image, and substantially conforms to the English grammar, the phrase structure is substantially accurate, and the phrases can be organically and naturally connected. Comparing the effects of the Muti-Head model, it can be seen that the syntax of the model of the present invention is more reasonable, and as shown in the description sentences of the Muti-Head model in FIG. 4 (b) and FIG. (c), the predicate "are" absent "; in the aspect of picture boundary identification, the model effect of the invention is better, for example, the Muti-Head model identifies the ground of the graph (a), the graph (b) and the graph (c) as 'field', and the model of the invention respectively and accurately identifies 'ground', 'court' and 'grass'; in the aspect of target entity identification, the two models have good effects, and target entities in the graph are accurately identified.
In general, the model and the method provided by the invention have better effect in the image description task, have even better performance in some aspects compared with other image description models, can basically accurately describe the information contained in the image, and the generated natural language can express the meaning of the image generally although some problems exist.
The invention has the beneficial effects that:
according to the image feature method based on the improved LSTM, after the original image feature vector is obtained, the problem that the existing LSTM model does not conform to the context when a new word is generated is considered, and the LSTM model with the above information is constructed in the decoding stage to perform feature extraction, so that the accuracy of image feature extraction is improved.

Claims (9)

1. An image feature extraction method based on improved LSTM is characterized by comprising the following steps:
s1, extracting feature vectors: inputting an original image into a convolutional neural network, and extracting a corresponding feature vector;
s2, acquiring image characteristics: and inputting the extracted feature vector into a trained LSTM model to obtain image features.
2. The improved LSTM-based image feature extraction method of claim 1, wherein the size of the original image in step S1 is 128 x 128;
the convolutional neural network has a 5-layer network structure;
the number of feature vectors extracted by the convolutional neural network is 1.
3. The improved LSTM-based image feature extraction method of claim 1, wherein the convolutional neural network in step S1 includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer and a fully-connected layer which are connected in sequence;
the first convolution layer inputs 128 x 128 original images;
the first convolution layer comprises 8 convolution kernels with the size of 5 x 5 and outputs 8 feature maps with the size of 64 x 64;
the second convolution layer comprises 16 convolution kernels with the size of 4 x 4 and outputs 16 characteristic maps of 32 x 32;
the third convolutional layer comprises 32 convolutional kernels with the size of 3 × 3, and 32 16 × 16 feature maps are output;
the fourth convolution layer comprises 64 convolution kernels with the size of 2 x 2 and outputs 64 characteristic maps of 16 x 16;
and the full connection layer connects all the characteristic graphs output by the fourth convolution layer and outputs 1 characteristic vector.
4. The improved LSTM-based image feature extraction method of claim 1, wherein the LSTM model in step S2 comprises a plurality of sequentially connected LSTM units;
in the data flow of the LSTM model:
the output end of the previous LSTM unit outputs the word generated by the LSTM unit and inputs the word into the current LSTM unit; the word generated by the previous LSTM unit and the word generated by the current LSTM unit are subjected to vector dot multiplication to be used as the above information, and the above information is input into the next LSTM unit.
5. The improved LSTM-based image feature extraction method of claim 4, wherein each LSTM unit comprises a forgetting gate, an input gate and an output gate connected in sequence;
the forgetting gate is used for determining the information which needs to be discarded by the LSTM unit;
the input gate is used for determining the quantity of information input into the current LSTM unit and updating the state and the information of the current LSTM unit;
the output gate is used for determining the state and the hidden layer state which need to be output by the current LSTM unit.
6. The improved LSTM-based image feature extraction method of claim 5, wherein said forgetting gate comprises a sigmoid function and a vector point multiplication;
in the data flow direction of the forgetting gate:
hidden layer state h output by previous LSTM unitt-1And current LSTM cell input xtObtaining the current value after being processed by sigmoid functionInformation f in LSTM celltA 1 is to ftState C output from the previous LSTM cellt-1The information obtained by vector dot multiplication that needs to be discarded is input into the input gate.
7. The improved LSTM-based image feature extraction method of claim 5, wherein said input gate comprises a sigmoid function, tanh function, vector point multiplication and vector accumulation;
in the data flow direction of the input gate:
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltObtaining information i needing to be updated in the current LSTM unit after processing through sigmoid functiontHidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltUpdating the state of the current LSTM cell to be
Figure FDA0002208371690000021
Will itAndand after vector dot multiplication, performing vector accumulation with the output of the forgetting gate, and inputting an accumulation result into an output gate.
8. The improved LSTM-based image feature extraction method of claim 5, wherein said output gate comprises a sigmoid function, tanh function and vector dot product;
in the data flow direction of the output gate:
hidden layer state h output by previous LSTM unitt-1And the input x of the current LSTM celltCarrying out vector point multiplication after sigmoid function processing to obtain a hidden layer state h required to be output by the current LSTM unittMeanwhile, the result of the vector dot product is processed by the tanh function, and the output result of the input gate is combined to determine the state C which needs to be output by the current LSTM unitt
9. The improved LSTM-based image feature extraction method according to claim 1, wherein in step S2, the optimization function of the LSTM model during training is an SGD optimizer;
during the LSTM model training process:
setting the initial learning rate to be 5e-4, and reducing the learning rate to be 0.8 time of the original learning rate every iteration for 4 times; while the batch size is set to 48 and the maximum number of iterations is set to 50.
CN201910889843.4A 2019-09-20 2019-09-20 Image feature extraction method based on improved LSTM Withdrawn CN110633713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910889843.4A CN110633713A (en) 2019-09-20 2019-09-20 Image feature extraction method based on improved LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910889843.4A CN110633713A (en) 2019-09-20 2019-09-20 Image feature extraction method based on improved LSTM

Publications (1)

Publication Number Publication Date
CN110633713A true CN110633713A (en) 2019-12-31

Family

ID=68971760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910889843.4A Withdrawn CN110633713A (en) 2019-09-20 2019-09-20 Image feature extraction method based on improved LSTM

Country Status (1)

Country Link
CN (1) CN110633713A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844743A (en) * 2017-09-28 2018-03-27 浙江工商大学 A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network
CN108009493A (en) * 2017-11-30 2018-05-08 电子科技大学 Face anti-fraud recognition methods based on action enhancing
CN108805080A (en) * 2018-06-12 2018-11-13 上海交通大学 Multi-level depth Recursive Networks group behavior recognition methods based on context
CN108921796A (en) * 2018-06-07 2018-11-30 西安电子科技大学 A kind of Infrared Image Non-uniformity Correction method based on deep learning
CN109190472A (en) * 2018-07-28 2019-01-11 天津大学 Combine pedestrian's attribute recognition approach of guidance with attribute based on image
CN109670164A (en) * 2018-04-11 2019-04-23 东莞迪赛软件技术有限公司 Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844743A (en) * 2017-09-28 2018-03-27 浙江工商大学 A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network
CN108009493A (en) * 2017-11-30 2018-05-08 电子科技大学 Face anti-fraud recognition methods based on action enhancing
CN109670164A (en) * 2018-04-11 2019-04-23 东莞迪赛软件技术有限公司 Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer
CN108921796A (en) * 2018-06-07 2018-11-30 西安电子科技大学 A kind of Infrared Image Non-uniformity Correction method based on deep learning
CN108805080A (en) * 2018-06-12 2018-11-13 上海交通大学 Multi-level depth Recursive Networks group behavior recognition methods based on context
CN109190472A (en) * 2018-07-28 2019-01-11 天津大学 Combine pedestrian's attribute recognition approach of guidance with attribute based on image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵一秾 等: "双流卷积网络工人异常行为识别算法研究", 《辽宁科技大学学报》 *

Similar Documents

Publication Publication Date Title
US10949709B2 (en) Method for determining sentence similarity
CN108830287A (en) The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method
WO2016033965A1 (en) Method for generating image classifier and image classification method and device
CN110033008B (en) Image description generation method based on modal transformation and text induction
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN110598224A (en) Translation model training method, text processing device and storage medium
CN112906392B (en) Text enhancement method, text classification method and related device
CN111767718B (en) Chinese grammar error correction method based on weakened grammar error feature representation
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN109983473B (en) Flexible integrated recognition and semantic processing
CN111259768A (en) Image target positioning method based on attention mechanism and combined with natural language
CN109977199A (en) A kind of reading understanding method based on attention pond mechanism
CN114582470A (en) Model training method and device and medical image report labeling method
CN110674777A (en) Optical character recognition method in patent text scene
CN111428490A (en) Reference resolution weak supervised learning method using language model
CN111651993A (en) Chinese named entity recognition method fusing local-global character level association features
CN111353040A (en) GRU-based attribute level emotion analysis method
CN110991515B (en) Image description method fusing visual context
CN114239612A (en) Multi-modal neural machine translation method, computer equipment and storage medium
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN110598718A (en) Image feature extraction method based on attention mechanism and convolutional neural network
CN110610006A (en) Morphological double-channel Chinese word embedding method based on strokes and glyphs
CN110633713A (en) Image feature extraction method based on improved LSTM
CN114372467A (en) Named entity extraction method and device, electronic equipment and storage medium
CN115577720A (en) Mongolian Chinese machine translation method based on depth residual error shrinkage network and seq2seq

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191231