CN112434145A

CN112434145A - Picture-viewing poetry method based on image recognition and natural language processing

Info

Publication number: CN112434145A
Application number: CN202011333715.0A
Authority: CN
Inventors: 李雪威; 解向川; 雷松源; 陈志超; 童跃凡; 任艺丹; 徐天一; 赵满坤; 高洁; 刘志强
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-03-02

Abstract

The invention relates to a picture-viewing poetry method based on image recognition and natural language processing, which is characterized in that: the method comprises the following steps: s1, collecting and processing an image data set; s2, establishing an image recognition model for extracting keywords related to the image; s3, testing the image recognition effect; s4, poetry data set is collected and processed; s5, establishing a keyword and poem matching model; and S6, establishing a poem writing model. The invention has scientific and reasonable design, obtains key words by processing images through the VGG16 neural network, calculates and matches the optimal poetry through a word bag model, TF-IDF and cosine similarity, and compiles the poetry through the LSTM neural network. After the picture input by the user is processed, the output result obtained by segmenting the data set and optimizing the keywords has certain linguistic significance and excellent effect.

Description

Picture-viewing poetry method based on image recognition and natural language processing

Technical Field

The invention belongs to natural language processing and image recognition, and particularly relates to a poetry looking picture method based on image recognition and natural language processing.

Background

Image recognition refers to a computer vision technique that processes an unknown image with a computer and recognizes relevant information in the image. Generalized image recognition can be simplified into four steps: image acquisition, image preprocessing, feature extraction and image identification. Feature extraction refers to converting various information contained in an image into a feature vector which is convenient for computer processing under a specific recognition task. Feature extraction typically utilizes convolutional neural networks. The method is a feedforward neural network which comprises convolution calculation and has a depth structure, and the artificial neurons of the feedforward neural network can respond to peripheral units in a part of coverage range and have excellent performance on large-scale image processing. After the image recognition is carried out by feature extraction, all information of the image is converted into a series of feature vectors, and the image recognition is a process for recognizing the feature vectors.

TF-IDF is a commonly used weighting technique for information retrieval and text mining to evaluate the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.

A Recurrent Neural Network (RNN) is a type of recurrent neural network in which sequence data is input, recursion is performed in the direction of evolution of the sequence, and all nodes are connected in a chain. Long-short term memory (LSTM) is a time-cycle neural network that passes information useful for subsequent calculations by forgetting information in the state of cells and remembering new information, while useless information is discarded and hidden state variables are output at each time step.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a poetry method based on image recognition and natural language processing by using a picture, wherein images are processed by a VGG16 neural network to obtain keywords, the optimal poetry is calculated and matched by a bag-of-words model, TF-IDF and cosine similarity, and the poetry is compiled by an LSTM neural network. After the picture input by the user is processed, the output result obtained by segmenting the data set and optimizing the keywords has certain linguistic significance and excellent effect.

The technical problem to be solved by the invention is realized by the following technical scheme:

a picture-viewing poetry method based on image recognition and natural language processing is characterized in that: the method comprises the following steps:

s1, collecting and processing an image data set;

s2, establishing an image recognition model for extracting keywords related to the image;

s3, testing the image recognition effect;

s4, poetry data set is collected and processed;

s5, establishing a keyword and poetry matching model for obtaining poetry matched with the image;

and S6, establishing a poetry writing model for creating poetry related to the keywords.

Moreover, the specific steps of collecting and processing the image data set in step S1 are as follows:

a. collecting pictures related to ancient poems by a crawler method;

b. manually screening pictures to ensure the quality of the pictures;

c. the original picture data set is converted to a file in tfrecrds format for subsequent use.

The specific steps of step S2 are: an image recognition model was built from the VGG16 model, and 16 layers of CNNs were constructed by iteratively stacking 3 × 3 small convolution kernels and 2 × 2 maximal pooling layers.

In step S3, the model is tested using the test set picture, and the recognition effect is determined according to the result of the picture classification and the probability.

In step S4, a large number of ancient poetry data sets are collected, and preprocessing such as unifying the format and removing abnormal data is performed on the poetry data sets, and the processed data is used as a corpus.

Moreover, the step S5 of matching the keywords with the poems specifically includes the steps of:

a. dividing poetry into words, and calling a model for calculating TF-IDF to statistically calculate the TF-IDF weight of each word in a word frequency matrix;

b. calculating the similarity between the keywords and the poem;

c. and (4) taking each half of the cosine similarity and the Jaccard value as a weight to sort the similarity values, and selecting the poem with higher matching degree as a result.

The invention has the advantages and beneficial effects that:

1. the invention relates to a poetry viewing method based on image recognition and natural language processing, which comprises the steps of processing an image through a VGG16 neural network to obtain keywords, calculating and matching optimal poetry through a bag-of-words model, TF-IDF and cosine similarity, and compiling poetry through an LSTM neural network. After the picture input by the user is processed, the output result obtained by segmenting the data set and optimizing the keywords has certain linguistic significance and excellent effect.

2. The invention relates to a picture-viewing poetry method based on image recognition and natural language processing, which can extract keywords of objects existing in an image, and match the existing ancient poetry or create new poetry according to the requirements of a user on the basis of the keywords; the invention hopes to explore a method for endowing a computer with higher level intelligence through fusion and collision among technologies in different fields, and finally, according to actual test effects, the invention accords with expected effects to a certain extent and can finish the purpose of poetry by looking at the picture.

Drawings

FIG. 1 is a schematic diagram of a VGG16 structure according to the present invention;

FIG. 2 is a comparison graph of the differences between LSTM and RNN of the present invention;

FIG. 3 is a schematic diagram illustrating the effect of the present invention;

fig. 4 is a schematic diagram illustrating another effect of the present invention.

Detailed Description

The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.

A picture-viewing poetry method based on image recognition and natural language processing is characterized in that: the method comprises the following specific steps:

step 1: collecting and preprocessing image data sets

In order to establish a reliable and required image recognition model, the method has higher requirements on the images of the training model, and has the advantages of higher accuracy and enough sensitivity to the frequently-occurring intention in poetry. There is therefore a need to collect a sufficient, high quality image of the usual intent of ancient poetry.

Step 2: establishing an image recognition model

The good image recognition model directly determines the final use effect. The invention utilizes the VGG16 model proposed by the vision geometry group of Oxford university, obtains the required image recognition model based on the model and the training of 70% of the collected data in the data set, and utilizes the rest 30% of the collected data in the image data set to test the image recognition model to ensure the quality of the image recognition model.

And step 3: gathering and preprocessing poetry data

In order to train a poetry writing model and provide the poetry matching function by the invention, poetry data also needs to be collected. The invention utilizes the existing poetry data set ' chicken-poetry ' to be the most complete Chinese poetry classical literature set database '. The poetry database is the most comprehensive Chinese classical corpus database and comprises 5.5 thousands of Tang poems, 26 thousands of Song poems, 2.1 thousands of Song poems and other classical corpuses. And according to the classification of the image data set, the poetry data set is also classified according to the contained intention and is stored in a file in a csv format.

And 4, step 4: establishing a matching model of keywords and poems

The principle of the matching model is as follows: firstly, calculating TF-IDF weight of each poem, then calculating similarity between keywords obtained by an image recognition model and the poems, obtaining total similarity by using cosine similarity and jaccar similarity accounting for 50& respectively, sequencing poems according to the similarity between the total similarity and the keywords, and selecting the poems with the highest similarity as a final result.

And 5: establishing poem-writing model

The invention utilizes the collected poetry data set to train a poetry writing model by means of an LSTM network. The generated model generates poetry according to the input keywords, but the generated result format and punctuation have problems, and cutting is needed to remove the content after the last period.

The VGG16 model is a convolutional neural network model proposed by Simony and Zisserman of the Visual Geometry Group (Visual Geometry Group) of the university of Oxford, and comprises 13 convolutional layers and 3 fully-connected layers. The VGG16 model performs well in the field of image recognition, which is why we chose it. The network structure is shown in fig. 1.

When training a VGG16 network, a tfrecrd file which is made by making a data set crawled by the user into two parts is used as input, and the specified keywords are common poetry keywords such as: and (4) performing iterative training on chrysanthemum, plum and moon, wherein the learning rate is set to be 0.0001, the batch size is 64, and the steps are jointly trained for 5000 steps to obtain two trained network models.

The invention constructs a test set by the same method, evaluates the VGG16 image recognition model, and the main purpose of the test set accuracy calculation is to evaluate the performance of the model on a non-training set and whether an overfitting phenomenon exists, so that the accuracy of both recognition models is more than 90%. It can be seen that the model also performs well on the test set.

In a module of matching poetry with keywords, dividing words of the poetry by using a natural language processing technology, and calling a word bag model to convert words in a text into a word frequency matrix; then calling a model for calculating TF-IDF to count the TF-IDF weight of each word in the word frequency matrix; and obtaining the cosine similarity and the Jaccard value of the keywords and each word. Poetry that best matches the keywords identified by the image can be found with higher accuracy in this way.

The poetry writing model adopts an LSTM network structure, is a special RNN network, compared with the RNN, the LSTM solves the problems of gradient disappearance and gradient explosion in the long sequence training process, and the LSTM has better performance in a longer sequence. The fact proves that the LSTM network can well write out poems which meet the requirements according to the keywords. The main input-output differences between the LSTM structure and the general RNN are shown in fig. 2.

By combining the results, the invention can realize the function of matching the existing ancient poems or creating new poems by looking at pictures according to the requirements of users, the specific practical use effect is shown in figures 3 and 4, and the function of looking at pictures to make poems can be well finished.

The method is an image-viewing poetry-based poetry method based on image recognition and natural language processing, can extract keywords of objects existing in an image, and matches the existing ancient poetry or creates new poetry according to the requirements of users on the basis of the keywords. The invention hopes to explore a method for endowing a computer with higher level intelligence through fusion and collision among technologies in different fields, and finally, according to actual test effects, the invention accords with expected effects to a certain extent and can finish the purpose of poetry by looking at the picture.

Although the embodiments of the present invention and the accompanying drawings are disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the disclosure of the embodiments and the accompanying drawings.

Claims

1. A picture-viewing poetry method based on image recognition and natural language processing is characterized in that: the method comprises the following steps:

s1, collecting and processing an image data set;

s3, testing the image recognition effect;

s4, poetry data set is collected and processed;

2. The picture-viewing poetry method based on image recognition and natural language processing as claimed in claim 1, wherein: the specific steps of collecting and processing the image data set in step S1 are as follows:

a. collecting pictures related to ancient poems by a crawler method;

b. manually screening pictures to ensure the quality of the pictures;

3. The picture-viewing poetry method based on image recognition and natural language processing as claimed in claim 1, wherein: the specific steps of step S2 are: an image recognition model was built from the VGG16 model, and 16 layers of CNNs were constructed by iteratively stacking 3 × 3 small convolution kernels and 2 × 2 maximal pooling layers.

4. The picture-viewing poetry method based on image recognition and natural language processing as claimed in claim 1, wherein: and step S3, testing the model by using the test set picture, and judging the recognition effect according to the picture classification result and the probability.

5. The picture-viewing poetry method based on image recognition and natural language processing as claimed in claim 1, wherein: step S4 is to collect a large number of ancient poetry data sets, and to perform preprocessing such as unifying the format and eliminating abnormal data on the poetry data sets, and to use the processed data as a corpus.

6. The picture-viewing poetry method based on image recognition and natural language processing as claimed in claim 1, wherein: the step S5 of matching the keywords with the poems specifically includes the steps of:

b. calculating the similarity between the keywords and the poem;