CN109086270A - System and method of composing poem automatically based on classic poetry corpus vectorization - Google Patents

System and method of composing poem automatically based on classic poetry corpus vectorization Download PDF

Info

Publication number
CN109086270A
CN109086270A CN201810817519.7A CN201810817519A CN109086270A CN 109086270 A CN109086270 A CN 109086270A CN 201810817519 A CN201810817519 A CN 201810817519A CN 109086270 A CN109086270 A CN 109086270A
Authority
CN
China
Prior art keywords
corpus
poem
word
vector
processing mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810817519.7A
Other languages
Chinese (zh)
Other versions
CN109086270B (en
Inventor
铉静
何伟东
李良炎
何中市
吴琼
郭飞
张航
周泽寻
杜井龙
王路路
陈定定
许祥娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201810817519.7A priority Critical patent/CN109086270B/en
Publication of CN109086270A publication Critical patent/CN109086270A/en
Application granted granted Critical
Publication of CN109086270B publication Critical patent/CN109086270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The system and method for composing poem automatically based on classic poetry corpus vectorization that the invention discloses a kind of, first corpus vector is converted by the word of classic poetry, the model is trained after building LSTM network model, image word is inputted to corpus processing mechanism, poem alternative word is calculated according to the corpus vector for corresponding to each image word in corpus vector library in the corpus processing mechanism, poem alternative word is inputted into LSTM network model, obtain poem rough draft, the poem that the rule is best suitable in poem rough draft is finally chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc., obtain final version poem, it composes poem automatically result.Beneficial effects of the present invention: machine can sufficiently learn meaning and artistic conception in poem, and then the classic poetry that crucial words is needed is directly inputted according to the neural network after study when needing to compose poem, the ability composed poem is obtained using the empirical learning of forefathers, also there is aesthetic while meeting poem rule.

Description

System and method of composing poem automatically based on classic poetry corpus vectorization
Technical field
It composes poem technical field the present invention relates to computer, specifically, is related to a kind of based on classic poetry corpus vector automatically The system and method for composing poem automatically changed.
Background technique
With the continuous propulsion of computer technology and hardware computing capability, artificial intelligence has become closer to the pre- of people Think, as robot AlphaGo can surpass go world champion by calculating, but in creative or artistry field, artificial intelligence It can still can not be competent at related work, such as classical Chinese poetry, be a kind of art of language, artistic value and literary achievements source Remote stream length.Classic poetry is provided simultaneously with regularity and abstractness, and the level and oblique tone rule of the different style of a verse, poem, etc.s has regulation, each also to need rhythm Foot matching, stringent regulation is so that classic poetry has the aesthetic feeling on pronunciation and rhythm, simultaneously because Chinese culture is extensive, each The meaning of word can all have the understanding of multiple content and different people, and therefore, the creation of classic poetry needs the outstanding poem to forefathers The poem created an aesthetic feeling with artistic conception can be just made after practising, merging.
For computer and artificial intelligence, regular work is easily accomplished, but creation and the artistic beauty of abstractness Sense is the difficult point that machine is composed poem: 1, how by natural language vectorization, becoming the language that machine can be understood, and make nature The information contained in language obtains the maximum amount of preservation;2, make how these vectors can be calculated, make computer mould Anthropomorphic class handles natural language;3, the model for how constructing neural network can more suitably characterize pass between lteral data System, while the calculating cost of least cost;4, how network design optimization method and hyper parameter solve training problem to be promoted The final effect of model;If 5, input picture is composed poem, the scenery and theme in picture how are navigated to, and identify object Body title;6, the emotion for needing to retain the generated sample of machine original work is changed in word in level and oblique tone and rhymed inspection.Currently, neural network The parameters such as rate are practised, model buildings etc. are all to need to go to accumulate experience in continuous practice, to obtain that this problem is suitble to ask The parameter model of solution.
Summary of the invention
For the purpose that realization machine is composed poem automatically, the movement certainly based on classic poetry corpus vectorization that the invention proposes a kind of Each word of outstanding poem in history is converted to corpus vector, while established between corpus vector by poem system and method Relationship, to enable the machine to sufficiently learn the meaning and artistic conception in poem, and then when needing to compose poem according to the mind after study The classic poetry that crucial words is needed is directly inputted through network, the ability composed poem is obtained using the empirical learning of forefathers, meets Also there is aesthetic while poem rule.
In order to achieve the above objectives, the specific technical solution that the present invention uses is as follows:
A kind of composing poem system automatically based on classic poetry corpus vectorization, including corpus processing mechanism, corpus vector library, LSTM network model, poem sorting mechanism;
The corpus processing mechanism is used to convert the operation of corpus vector and corpus vector;
Corpus vector library is for storing corpus vector;
The LSTM network model is for generating poem rough draft;
The poem sorting mechanism is used to handle the rhymed and level and oblique tone operation of poem rough draft;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem Word sorting mechanism is sequentially connected with.
By above-mentioned design, system of composing poem automatically learns the language in poem between each words by LSTM network model Relationship and habit, when needing to compose poem, input keyword arrives system, and corpus processing mechanism can be by after identifying keyword and decomposition LSTM network model makes poem rough draft, then meets content rhymed, that level and oblique tone is regular as last through the selection of poem sorting mechanism Determining poem original text, finally obtain compose poem as a result, and LSTM network model learn when need to be by historical outstanding poem through corpus Processing mechanism processing is corpus vector and is stored in corpus vector library, the term habit of a large amount of outstanding poems is obtained, to make The artistic poem of more excellent tool.
It further describes, the LSTM network model is the network model of two layers serial of LSTM structure composition, described The majorized function of LSTM network model is to calculate stochastic gradient descent, and loss function is to calculate cross entropy.
The network model of two layers serial of LSTM structure composition can more accurately identify the relationship between words, but due to The data volume that it is obtained is bigger, can suitably reduce the data volume of its calculated result.
Preferably, the 20% of discarding total data, learning rate 0.01, the number of iterations after the LSTM network model calculates It is 700.
A kind of method of composing poem automatically based on classic poetry corpus vectorization, using following steps:
S1, input classic poetry to corpus processing mechanism, the corpus processing mechanism by the word of classic poetry be converted into corpus to Amount, and the corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4, input image word to corpus processing mechanism, the corpus processing mechanism is according to corresponding every in corpus vector library Poem alternative word is calculated in the corpus vector of a image word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism are chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc. and are best suitable for the rule in poem rough draft Poem, obtains final version poem, and the final version poem is as composed poem result automatically.
By above-mentioned design, a large amount of outstanding classic poetries of input enter corpus processing mechanism, by by the every of every first poem A word vectorization processing, such as uses skip-gram model, corpus vector is obtained, to enable computer that can identify each word Related content, LSTM network model are capable of handling the connection relationship between text, and the meaning for understanding each word is reached with this and is divided The purpose of relationship between word and word is analysed, the training process of LSTM network model is to learn the process of word, described when training completion LSTM network model can simply compose poem, and reprocess the content that rhymed and level and oblique tone isotactic restrains property after making poem rough draft, finally Poem is completed to finalize a text.
It further describes, the particular content of step S1 is as follows:
S1.1, to corpus processing mechanism, the corpus processing mechanism is split in classic poetry each of input classic poetry to occur Word is denoted as m unduplicated words, wherein occurring being denoted as the same unduplicated word greater than the primary same word;
S1.2, the context for counting the frequency of occurrence of each unduplicated word and its occurring in every first poem are adjacent Word;
S1.3, the corpus processing mechanism are that random n-dimensional vector is arranged in each unduplicated word, which is The corpus vector of the unduplicated word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is whole Number;
S1.4 constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, and each endpoint node is equal For the child node of intermediate node, each intermediate node only has 2 child nodes, and each endpoint node is respectively directed to corpus vector The corpus vector of a unduplicated word in library, and remember that the nodal value of the endpoint node is the occurrence out of corresponding unduplicated word Number, each intermediate node note nodal value are the nodal value summation of its child node, and the bigger endpoint node of nodal value is more from root node Closely, the root node is the maximum intermediate node of nodal value;
S1.5, the selection probability of the adjacent word of the context of corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method repeatedly to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each in corpus vector library Corpus vector x is all updated primary, obtains new corpus vector library.
By above-mentioned design, the n-dimensional vector of each unduplicated word is initially and is randomly provided, but by right after derivative operation Content in the corpus vector answered just with Huffman tree corresponds, i.e., each corpus vector contains it in the big of input The information of frequency, its adjacent word in every first poem that amount classic poetry occurs, the value of n is bigger, then corresponding information is got over It is abundant, but calculation amount is then corresponding bigger, to x, θiThe path in Huffman can more accurately be recorded by calculating local derviation, be allowed to count It calculates more accurate.
It is further described, the set for the corpus vector composition that the corpus training set is in corpus vector library 80%, and Corpus vector in the corpus training set sorts according to the word order in corresponding classic poetry;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is used for The parameter setting of training adjustment LSTM network model, verifying corpus proofread trained LSTM network mould adjusted for verifying Type.
The data in corpus vector library need to be divided into training corpus and verifying corpus, first input training corpus when training It practises, verifying corpus verifying learning effect is inputted after study again, until the learning effect reached.
It is further described, the image word of the step S4 is the figure that input picture is obtained to image characteristics extraction model As image word, the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract meaning from image As word;
S4.2, the corpus processing mechanism be the image word match in corpus vector library one by one corresponding corpus to Amount, which is poem alternative word.
Image word can be manually entered crucial words, and corpus processing mechanism identifies Corresponding matching after these crucial words again The corpus vector in corpus vector library can also separately set image characteristics extraction model, which can extract image In scene and be converted into words, only need input picture that the crucial scape in image can be obtained to image characteristics extraction model at this time The words of object, corpus processing mechanism are again handled the words of extraction, obtain poem alternative word.
It being further described, described image Feature Selection Model is improved VGG-16 convolutional neural networks model, including Sequentially connected convolutional layer group 1, pond layer, convolutional layer group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, Chi Hua Layer, convolutional layer group 5, pond layer, 2 convolutional layers, Bounding-box layers and Softmax layers, wherein the convolutional layer group 1, volume The convolutional layer that lamination group 2 is concatenated by 2 forms, and the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are concatenated by 3 Convolutional layer composition, and each convolutional layer be all connected with it is Bounding-box layers described.
Traditional VGG-16 convolutional neural networks structure is sequentially connected 2 convolutional layers, pond layer, 2 convolutional layers, ponds Change layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 full articulamentums, Softmax Layer, and 3 full articulamentums are adjusted to 2 on the basis of traditional VGG-16 by above-mentioned improved VGG-16 convolutional neural networks model A convolutional layer and Bounding-box layers, and each convolutional layer is enabled to be directly connected to Bounding-box layers, form full convolution net Network, and by the parameter of the Bounding-box layers of each convolutional layer of adjusting, in addition, more scenery need to be extracted when input picture is larger When can be corresponded to before Bounding-box layers be added convolutional layer.
Be further described, the image word of the step S4 is one words A of input, the corpus processing mechanism according to Subsequent association word is calculated in words A association, and words A and subsequent association word form word string, and the word string is that poem is standby Select word;
The method of the calculated for subsequent conjunctive word is to be found out in corpus vector library to match according to the corpus vector of previous words Highest next words is spent, the calculating of matching degree is as follows:
Wherein, a is the corpus vector of previous words, and b is the corpus vector of any words in corpus vector library, then meets Words corresponding to the maximum corpus vector b of cos (a, b) is next words.
Input image word can also be realized by aforesaid way, i.e., first input a words A, then pass through corpus processing mechanism meter Calculate corpus vector library in the highest words B of words A matching degree, then calculate with the highest words C of words B matching degree, with this Analogize, finally obtains several words composition word strings matched, word string input LSTM network model is then obtained into poem.Which A prompt word is only provided, subsequent content is calculated by machine Auto-matching completely.
It is further described, image word also is established into the same or similar words classification of meaning after step S1 input classic poetry It composes, the image word and its same or similar word of meaning in image collection of tunes of poems that poem alternative word described in step S4 includes input Word.
It, may phase not to the utmost to the description language of same thing in different poems due to the word ambiguity and near synonym factor of Chinese character Together, then the same or similar words of meaning can be formed verse that is a kind of, and working as the word composition of input by designing image collection of tunes of poems When lacking aesthetic feeling, it can correspond to and adjust the word, adjustment mode is to choose from the same or similar a kind of words of meaning.
Be further described, the style of a verse, poem, etc. composed poem automatically be seven-character octave, level and oblique tone rule are as follows: " in put down in it is narrow it is average-, In it is narrow it is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow average narrow Flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-." or " in it is narrow it is average it is narrow it is flat-, in put down in it is narrow it is average-.In put down in It is narrow average narrow, in it is narrow it is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow flat Level and oblique tone it is narrow it is flat-.";
Wherein, put down-indicate that the rhyme of the word is flat or narrow;
Then in step 6 level and oblique tone rule choosing method are as follows: the poem sorting mechanism word for word compares poem rough draft and level and oblique tone Whether rule is consistent, if inconsistent, replaces in image collection of tunes of poems the same or similar word of meaning simultaneously for inconsistent word is corresponding Again level and oblique tone rule is compared, until poem rough draft and level and oblique tone rule are completely the same.
Beneficial effects of the present invention:
1, connection of the Recognition with Recurrent Neural Network based on human brain feature and neuron, for natural language study very close to people Study of the class to natural language, therefore after introducing LSTM network model, after learning above the corpus of big data, machine can be with A preferable generation model is obtained, to handle the logic of poem, poem and image relationship.
2, convolutional neural networks are had outstanding performance in the identification to object, and it is special to extract most of scenery that we need Sign, also provides keyword abundant and image theme for poetry creation.
3, because term vector is calculated by the word frequency and its co-occurrence of poem corpus, and the co-occurrence of word and word Also the relationship between word and word has been reacted, therefore it is remote with the relationship of word to reflect word by the cosine that term vector calculates its vector Closely, therefore, rhythm word, the replacement of level and oblique tone can be carried out with the method, the extension etc. of word cloud uses word classification chart in conjunction with classic poetry, Implement also convenient and efficient.
4, image collection of tunes of poems of poem can be used for the keyword input step of machine generation, and be expanded using image collection of tunes of poems Exhibition, to solve the problems, such as most machine to compose poem, the theme occurred in system is inconsistent and random skip.
5, the introduction, elucidation of the theme of poem is a big feature and the human brain thinking process in composing poem, and the present invention uses word string technology, It allows the mode of thinking of the machine simulation mankind, is the beneficial practice in cognitive engineering, can realize that the mankind write to a certain extent Mechanism is composed poem the intelligence of the artistic creation in task in machine.
Detailed description of the invention
Fig. 1 is system structure diagram of the invention;
Fig. 2 is the structural schematic diagram of the LSTM network model of embodiment;
Fig. 3 is flow chart of the method for the present invention;
Fig. 4 is the detail flowchart of step S1;
Fig. 5 is the Huffman schematic diagram of embodiment;
Fig. 6 is improved VGG-16 convolutional neural networks model structure schematic diagram of the invention;
Fig. 7 is the improved VGG-16 convolutional neural networks model structure schematic diagram of embodiment.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in Figure 1, a kind of compose poem automatically system, including corpus processing mechanism, language based on classic poetry corpus vectorization Expect vector library, LSTM network model, poem sorting mechanism;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem Word sorting mechanism is sequentially connected with.
LSTM network model described in the present embodiment is preferably the network model of two layers serial of LSTM structure composition, such as Fig. 2 Shown, two dotted line frames respectively indicate one layer of LSTM structure, each a in figure up and down in figurei,jIt indicates a neuron, inputs X1、X2For the corpus vector of two words sequentially connected in classic poetry, the h of output is the connection relationship of two words, i.e., often The connected word of 1 two word is inputted, LSTM network model can learn word relationship therein;
Preferably, the majorized function of the LSTM network model is to calculate stochastic gradient descent, and loss function is to calculate to hand over Entropy is pitched, the 20% of discarding total data, learning rate 0.01, the number of iterations 700 after the LSTM network model calculates.
As shown in figure 3, a kind of method of composing poem automatically based on classic poetry corpus vectorization, using following steps:
S1, input classic poetry to corpus processing mechanism, the corpus processing mechanism by the word of classic poetry be converted into corpus to Amount, and the corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4, input image word to corpus processing mechanism, the corpus processing mechanism is according to corresponding every in corpus vector library Poem alternative word is calculated in the corpus vector of a image word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism are chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc. and are best suitable for the rule in poem rough draft Poem, obtains final version poem, and the final version poem is as composed poem result automatically.
Wherein, the particular content of step S1 is as shown in Figure 4:
S1.1, to corpus processing mechanism, the corpus processing mechanism is split in classic poetry each of input classic poetry to occur Word is denoted as m unduplicated words, wherein occurring being denoted as the same unduplicated word greater than the primary same word;
S1.2, the context for counting the frequency of occurrence of each unduplicated word and its occurring in every first poem are adjacent Word;
S1.3, the corpus processing mechanism are that random n-dimensional vector is arranged in each unduplicated word, which is The corpus vector of the unduplicated word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is whole Number, the preferred n=200 of the present embodiment;
S1.4 constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, and each endpoint node is equal For the child node of intermediate node, each intermediate node only has 2 child nodes, and each endpoint node is respectively directed to corpus vector The corpus vector of a unduplicated word in library, and remember that the nodal value of the endpoint node is the occurrence out of corresponding unduplicated word Number, each intermediate node note nodal value are the nodal value summation of its child node, and the bigger endpoint node of nodal value is more from root node Closely, the root node is the maximum intermediate node of nodal value;
Preferably, 3,000 two seven-word poems of the present embodiment selected parts: are flown down straightly " in li po's " hoping Mount Lushan waterfall " Ruler, as if the Silver River were falling from Heaven." and Tu Fu's " poem of four lines " in " ridge a Chuan Hanxi eternal lasting snow, the inner ship of Men Bo Wu ten thousand." established with this Huffman tree, wherein " thousand " word occurs 2 times, remaining word only occurs 1 time, and therefore the endpoint node of " thousand " word is than remaining word Endpoint node is closer from root node, and the nodal value of " thousand " word is 2 simultaneously, and remaining word is 1, is ultimately formed as shown in Figure 5 Huffman tree.
S1.5, the selection probability of the adjacent word of the context of corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method repeatedly to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each in corpus vector library Corpus vector x is all updated primary, obtains new corpus vector library.
The set for the corpus vector composition that corpus training set used by the present embodiment is in corpus vector library 80%, and should Corpus vector in corpus training set sorts according to the word order in corresponding classic poetry;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is used for The parameter setting of training adjustment LSTM network model, verifying corpus proofread trained LSTM network mould adjusted for verifying Type.
The present embodiment in such a way that input picture is composed poem, i.e., the image word of the described step S4 be input picture extremely The image image word that image characteristics extraction model obtains, the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract meaning from image As word;
S4.2, the corpus processing mechanism be the image word match in corpus vector library one by one corresponding corpus to Amount, which is poem alternative word.
As shown in fig. 6, described image Feature Selection Model is improved VGG-16 convolutional neural networks model, including successively The convolutional layer group 1 of connection, pond layer (Pool), convolutional layer group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, Chi Hua Layer, convolutional layer group 5, pond layer, 2 convolutional layers, Bounding-box layers and Softmax layers, wherein the convolutional layer group 1, volume The convolutional layer (Conv) that lamination group 2 is concatenated by 2 forms, and the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are by 3 The convolutional layer of concatenation forms, and each convolutional layer be all connected with it is Bounding-box layers described.
Preferably improved VGG-16 convolutional neural networks model is structure shown in Fig. 7, dotted line part in figure to the present embodiment It is divided into traditional VGG-16 convolutional neural networks structure conventional part, i.e., sequentially connected 2 convolutional layers, pond layer, 2 convolution Layer, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, have been sequentially connected 6 later A convolutional layer finally connects Bounding-box layers and Softmax layers, and compared with the structure of Fig. 4, the structure of Fig. 5 exists 4 convolutional layers are increased before Bounding-box layers, to obtain the feature of more images, the convolution kernel of each convolutional layer is 3 × 3, pond layer is 2 × 2.
Embodiment two: the image word of the step S4 is one words A of input, and the corpus processing mechanism is according to the word Subsequent association word is calculated in word A association, and words A and subsequent association word form word string, and the word string is poem alternative word;
The method of the calculated for subsequent conjunctive word is to be found out in corpus vector library to match according to the corpus vector of previous words Highest next words is spent, the calculating of matching degree is as follows:
Wherein, a is the corpus vector of previous words, and b is the corpus vector of any words in corpus vector library, then meets Words corresponding to the maximum corpus vector b of cos (a, b) is next words.
Also the same or similar words of meaning is classified after step S1 input classic poetry and establishes image collection of tunes of poems, described in step S4 Poem alternative word include input image word and its same or similar words of meaning in image collection of tunes of poems.
Automatically the style of a verse, poem, etc. composed poem be seven-character octave, level and oblique tone rule are as follows: " in put down in it is narrow it is average-, in it is narrow average narrow Flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow it is average it is narrow it is flat-.In it is narrow in it is average It is narrow, in put down in it is narrow it is average-." or " in it is narrow it is average it is narrow it is flat-, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow It is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow it is average it is narrow it is flat-.";
Wherein, put down-indicate that the rhyme of the word is flat or narrow;
Then in step 6 level and oblique tone rule choosing method are as follows: the poem sorting mechanism word for word compares poem rough draft and level and oblique tone Whether rule is consistent, if inconsistent, replaces in image collection of tunes of poems the same or similar word of meaning simultaneously for inconsistent word is corresponding Again level and oblique tone rule is compared, until poem rough draft and level and oblique tone rule are completely the same.

Claims (10)

1. a kind of system of composing poem automatically based on classic poetry corpus vectorization, it is characterised in that: including corpus processing mechanism, corpus Vector library, LSTM network model, poem sorting mechanism;
The corpus processing mechanism is used to convert the operation of corpus vector and corpus vector;
Corpus vector library is for storing corpus vector;
The LSTM network model is for generating poem rough draft;
The poem sorting mechanism is used to handle the rhymed and level and oblique tone operation of poem rough draft;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem sieve Mechanism is selected to be sequentially connected with.
2. the system of composing poem automatically based on classic poetry corpus vectorization according to claim 1, it is characterised in that: the LSTM Network model is the network model of two layers serial of LSTM structure composition, the majorized function of the LSTM network model be calculate with The decline of machine gradient, loss function are to calculate cross entropy.
3. the system of composing poem automatically based on classic poetry corpus vectorization according to claim 2, it is characterised in that: the LSTM The 20% of discarding total data, learning rate 0.01, the number of iterations 700 after network model calculates.
4. a kind of method of composing poem automatically based on classic poetry corpus vectorization, it is characterised in that use following steps:
S1 inputs classic poetry to corpus processing mechanism, and the word of classic poetry is converted corpus vector by the corpus processing mechanism, and The corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4 inputs image word to corpus processing mechanism, and the corpus processing mechanism corresponds to each meaning according in corpus vector library As poem alternative word is calculated in the corpus vector of word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism choose the poem that the rule is best suitable in poem rough draft according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc. Word, obtains final version poem, and the final version poem is as composed poem result automatically.
5. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that step S1's Particular content is as follows:
S1.1, input classic poetry to corpus processing mechanism, the corpus processing mechanism split each word occurred in classic poetry, note For m unduplicated words, the same unduplicated word is denoted as wherein occurring being greater than the primary same word;
S1.2 counts the frequency of occurrence of each unduplicated word and its word that the context that occurs in every first poem is adjacent;
S1.3, the corpus processing mechanism be each unduplicated word random n-dimensional vector is set, the n-dimensional vector be this not The corpus vector of duplicate word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is integer;
S1.4, constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, during each endpoint node is The child node of intermediate node, each intermediate node only have 2 child nodes, and each endpoint node is respectively directed in corpus vector library The corpus vector of one unduplicated word, and remember that the nodal value of the endpoint node is the frequency of occurrence of corresponding unduplicated word, often A intermediate node note nodal value is the nodal value summation of its child node, and the bigger endpoint node of nodal value is closer from root node, institute Stating root node is the maximum intermediate node of nodal value;
S1.5, the selection probability of the adjacent word of the context of any corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each corpus in corpus vector library Vector x is all updated primary, obtains new corpus vector library.
6. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the corpus The set for the corpus vector composition that training set is in corpus vector library 80%, and the corpus vector in the corpus training set is according to right The word order in classic poetry is answered to sort;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is for training The parameter setting of LSTM network model is adjusted, verifying corpus proofreads trained LSTM network model adjusted for verifying.
7. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the step The image word of S4 is the image image word that input picture to image characteristics extraction model obtains, and the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract image word from image Language;
S4.2, the corpus processing mechanism are that the image word matches corresponding corpus vector in corpus vector library one by one, should Corpus vector is poem alternative word.
8. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 7, it is characterised in that: described image Feature Selection Model is improved VGG-16 convolutional neural networks model, including sequentially connected convolutional layer group 1, pond layer, volume Lamination group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, pond layer, convolutional layer group 5, pond layer, 2 convolutional layers, Bounding-box layers and Softmax layers, wherein the convolutional layer group that the convolutional layer group 1, convolutional layer group 2 are concatenated by 2 At the convolutional layer that the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are concatenated by 3 forms, and each convolutional layer connects It connects Bounding-box layers described.
9. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the step The image word of S4 is one words A of input, and subsequent association is calculated according to words A association in the corpus processing mechanism Word, words A and subsequent association word form word string, and the word string is poem alternative word;
The method of the calculated for subsequent conjunctive word is that matching degree is found out in corpus vector library according to the corpus vector of previous words most High next words, the calculating of matching degree are as follows:
Wherein, a be previous words corpus vector, b be corpus vector library in any words corpus vector, then meet cos (a, B) words corresponding to maximum corpus vector b is next words.
10. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: step S1 Image collection of tunes of poems also is established into the same or similar words classification of meaning after input classic poetry, poem alternative word packet described in step S4 The image word and its same or similar words of meaning in image collection of tunes of poems for including input.
CN201810817519.7A 2018-07-24 2018-07-24 Automatic poetry making system and method based on ancient poetry corpus vectorization Active CN109086270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810817519.7A CN109086270B (en) 2018-07-24 2018-07-24 Automatic poetry making system and method based on ancient poetry corpus vectorization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810817519.7A CN109086270B (en) 2018-07-24 2018-07-24 Automatic poetry making system and method based on ancient poetry corpus vectorization

Publications (2)

Publication Number Publication Date
CN109086270A true CN109086270A (en) 2018-12-25
CN109086270B CN109086270B (en) 2022-03-01

Family

ID=64838256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810817519.7A Active CN109086270B (en) 2018-07-24 2018-07-24 Automatic poetry making system and method based on ancient poetry corpus vectorization

Country Status (1)

Country Link
CN (1) CN109086270B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309510A (en) * 2019-07-02 2019-10-08 中国计量大学 It is a kind of that picture poem inscribed on a scroll method is seen based on C-S and GRU
CN110738061A (en) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 Ancient poetry generation method, device and equipment and storage medium
CN111814488A (en) * 2020-07-22 2020-10-23 网易(杭州)网络有限公司 Poetry generation method and device, electronic equipment and readable storage medium
CN112101006A (en) * 2020-09-14 2020-12-18 中国平安人寿保险股份有限公司 Poetry generation method and device, computer equipment and storage medium
CN112257775A (en) * 2020-10-21 2021-01-22 东南大学 Poetry method by graph based on convolutional neural network and unsupervised language model
CN112434145A (en) * 2020-11-25 2021-03-02 天津大学 Picture-viewing poetry method based on image recognition and natural language processing
CN112883710A (en) * 2021-01-13 2021-06-01 戴宇航 Method for optimizing poems authored by user
CN113051877A (en) * 2021-03-11 2021-06-29 杨虡 Text content generation method and device, electronic equipment and storage medium
CN113553822A (en) * 2021-07-30 2021-10-26 网易(杭州)网络有限公司 Ancient poetry generation model training method, ancient poetry generation equipment and storage medium
CN116070643A (en) * 2023-04-03 2023-05-05 武昌理工学院 Fixed style translation method and system from ancient text to English

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889366A (en) * 2006-07-13 2007-01-03 浙江大学 Hafman decoding method
CN104951554A (en) * 2015-06-29 2015-09-30 浙江大学 Method for matching landscape with verses according with artistic conception of landscape
CN105930318A (en) * 2016-04-11 2016-09-07 深圳大学 Word vector training method and system
CN105955964A (en) * 2016-06-13 2016-09-21 北京百度网讯科技有限公司 Method and apparatus for automatically generating poem
CN106569995A (en) * 2016-09-26 2017-04-19 天津大学 Method for automatically generating Chinese poetry based on corpus and metrical rule
CN107102981A (en) * 2016-02-19 2017-08-29 腾讯科技(深圳)有限公司 Term vector generation method and device
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based
CN107832292A (en) * 2017-11-02 2018-03-23 合肥工业大学 A kind of conversion method based on the image of neural network model to Chinese ancient poetry
US20180190249A1 (en) * 2016-12-30 2018-07-05 Google Inc. Machine Learning to Generate Music from Text

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889366A (en) * 2006-07-13 2007-01-03 浙江大学 Hafman decoding method
CN104951554A (en) * 2015-06-29 2015-09-30 浙江大学 Method for matching landscape with verses according with artistic conception of landscape
CN107102981A (en) * 2016-02-19 2017-08-29 腾讯科技(深圳)有限公司 Term vector generation method and device
CN105930318A (en) * 2016-04-11 2016-09-07 深圳大学 Word vector training method and system
CN105955964A (en) * 2016-06-13 2016-09-21 北京百度网讯科技有限公司 Method and apparatus for automatically generating poem
CN106569995A (en) * 2016-09-26 2017-04-19 天津大学 Method for automatically generating Chinese poetry based on corpus and metrical rule
US20180190249A1 (en) * 2016-12-30 2018-07-05 Google Inc. Machine Learning to Generate Music from Text
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based
CN107832292A (en) * 2017-11-02 2018-03-23 合肥工业大学 A kind of conversion method based on the image of neural network model to Chinese ancient poetry

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
QIXINWANG ET.AL: "Chinese Song Iambics Generation with Neural Attention-based Model", 《ARXIV:1604.06274V2 [CS.CL]》 *
YULIA TSVETKOV ET.AL: "Evaluation ofWord Vector Representations by Subspace Alignment", 《PROCEEDINGS OF THE 2015 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
周昌乐等: "中国古典诗词楹联的计算化研究", 《心智与计算》 *
苏劲松等: "基于统计抽词和格律的全宋词切分语料库建立", 《中文信息学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309510A (en) * 2019-07-02 2019-10-08 中国计量大学 It is a kind of that picture poem inscribed on a scroll method is seen based on C-S and GRU
CN110738061B (en) * 2019-10-17 2024-05-28 北京搜狐互联网信息服务有限公司 Ancient poetry generating method, device, equipment and storage medium
CN110738061A (en) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 Ancient poetry generation method, device and equipment and storage medium
CN111814488A (en) * 2020-07-22 2020-10-23 网易(杭州)网络有限公司 Poetry generation method and device, electronic equipment and readable storage medium
CN111814488B (en) * 2020-07-22 2024-06-07 网易(杭州)网络有限公司 Poem generation method and device, electronic equipment and readable storage medium
CN112101006A (en) * 2020-09-14 2020-12-18 中国平安人寿保险股份有限公司 Poetry generation method and device, computer equipment and storage medium
CN112257775A (en) * 2020-10-21 2021-01-22 东南大学 Poetry method by graph based on convolutional neural network and unsupervised language model
CN112434145A (en) * 2020-11-25 2021-03-02 天津大学 Picture-viewing poetry method based on image recognition and natural language processing
CN112883710A (en) * 2021-01-13 2021-06-01 戴宇航 Method for optimizing poems authored by user
CN113051877A (en) * 2021-03-11 2021-06-29 杨虡 Text content generation method and device, electronic equipment and storage medium
CN113553822B (en) * 2021-07-30 2023-06-30 网易(杭州)网络有限公司 Ancient poetry generating model training, ancient poetry generating method, equipment and storage medium
CN113553822A (en) * 2021-07-30 2021-10-26 网易(杭州)网络有限公司 Ancient poetry generation model training method, ancient poetry generation equipment and storage medium
CN116070643A (en) * 2023-04-03 2023-05-05 武昌理工学院 Fixed style translation method and system from ancient text to English
CN116070643B (en) * 2023-04-03 2023-08-15 武昌理工学院 Fixed style translation method and system from ancient text to English

Also Published As

Publication number Publication date
CN109086270B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN109086270A (en) System and method of composing poem automatically based on classic poetry corpus vectorization
CN109902171B (en) Text relation extraction method and system based on hierarchical knowledge graph attention model
CN108280064B (en) Combined processing method for word segmentation, part of speech tagging, entity recognition and syntactic analysis
CN107291693B (en) Semantic calculation method for improved word vector model
CN106569995B (en) Chinese ancient poetry word automatic generation method based on corpus and rules and forms rule
CN108415977A (en) One is read understanding method based on the production machine of deep neural network and intensified learning
CN105631468B (en) A kind of picture based on RNN describes automatic generation method
CN108710680A (en) It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning
CN107273355A (en) A kind of Chinese word vector generation method based on words joint training
CN108153864A (en) Method based on neural network generation text snippet
CN108647191A (en) It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
CN109271628A (en) A kind of iamge description generation method
CN109960747A (en) The generation method of video presentation information, method for processing video frequency, corresponding device
CN107766320A (en) A kind of Chinese pronoun resolution method for establishing model and device
CN106502979A (en) A kind of data processing method of natural language information and device
CN112069199A (en) Multi-round natural language SQL conversion method based on intermediate syntax tree
CN113342933B (en) Multi-feature interactive network recruitment text classification method similar to double-tower model
CN108038104A (en) A kind of method and device of Entity recognition
CN110059220A (en) A kind of film recommended method based on deep learning Yu Bayesian probability matrix decomposition
CN107679225A (en) A kind of reply generation method based on keyword
CN110047462A (en) A kind of phoneme synthesizing method, device and electronic equipment
CN111914555A (en) Automatic relation extraction system based on Transformer structure
CN108563637A (en) A kind of sentence entity complementing method of fusion triple knowledge base
CN108519976A (en) The method for generating extensive sentiment dictionary based on neural network
CN114220095A (en) Image semantic description improvement method based on instance segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant