CN109086270A - System and method of composing poem automatically based on classic poetry corpus vectorization - Google Patents
System and method of composing poem automatically based on classic poetry corpus vectorization Download PDFInfo
- Publication number
- CN109086270A CN109086270A CN201810817519.7A CN201810817519A CN109086270A CN 109086270 A CN109086270 A CN 109086270A CN 201810817519 A CN201810817519 A CN 201810817519A CN 109086270 A CN109086270 A CN 109086270A
- Authority
- CN
- China
- Prior art keywords
- corpus
- poem
- word
- vector
- processing mechanism
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The system and method for composing poem automatically based on classic poetry corpus vectorization that the invention discloses a kind of, first corpus vector is converted by the word of classic poetry, the model is trained after building LSTM network model, image word is inputted to corpus processing mechanism, poem alternative word is calculated according to the corpus vector for corresponding to each image word in corpus vector library in the corpus processing mechanism, poem alternative word is inputted into LSTM network model, obtain poem rough draft, the poem that the rule is best suitable in poem rough draft is finally chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc., obtain final version poem, it composes poem automatically result.Beneficial effects of the present invention: machine can sufficiently learn meaning and artistic conception in poem, and then the classic poetry that crucial words is needed is directly inputted according to the neural network after study when needing to compose poem, the ability composed poem is obtained using the empirical learning of forefathers, also there is aesthetic while meeting poem rule.
Description
Technical field
It composes poem technical field the present invention relates to computer, specifically, is related to a kind of based on classic poetry corpus vector automatically
The system and method for composing poem automatically changed.
Background technique
With the continuous propulsion of computer technology and hardware computing capability, artificial intelligence has become closer to the pre- of people
Think, as robot AlphaGo can surpass go world champion by calculating, but in creative or artistry field, artificial intelligence
It can still can not be competent at related work, such as classical Chinese poetry, be a kind of art of language, artistic value and literary achievements source
Remote stream length.Classic poetry is provided simultaneously with regularity and abstractness, and the level and oblique tone rule of the different style of a verse, poem, etc.s has regulation, each also to need rhythm
Foot matching, stringent regulation is so that classic poetry has the aesthetic feeling on pronunciation and rhythm, simultaneously because Chinese culture is extensive, each
The meaning of word can all have the understanding of multiple content and different people, and therefore, the creation of classic poetry needs the outstanding poem to forefathers
The poem created an aesthetic feeling with artistic conception can be just made after practising, merging.
For computer and artificial intelligence, regular work is easily accomplished, but creation and the artistic beauty of abstractness
Sense is the difficult point that machine is composed poem: 1, how by natural language vectorization, becoming the language that machine can be understood, and make nature
The information contained in language obtains the maximum amount of preservation;2, make how these vectors can be calculated, make computer mould
Anthropomorphic class handles natural language;3, the model for how constructing neural network can more suitably characterize pass between lteral data
System, while the calculating cost of least cost;4, how network design optimization method and hyper parameter solve training problem to be promoted
The final effect of model;If 5, input picture is composed poem, the scenery and theme in picture how are navigated to, and identify object
Body title;6, the emotion for needing to retain the generated sample of machine original work is changed in word in level and oblique tone and rhymed inspection.Currently, neural network
The parameters such as rate are practised, model buildings etc. are all to need to go to accumulate experience in continuous practice, to obtain that this problem is suitble to ask
The parameter model of solution.
Summary of the invention
For the purpose that realization machine is composed poem automatically, the movement certainly based on classic poetry corpus vectorization that the invention proposes a kind of
Each word of outstanding poem in history is converted to corpus vector, while established between corpus vector by poem system and method
Relationship, to enable the machine to sufficiently learn the meaning and artistic conception in poem, and then when needing to compose poem according to the mind after study
The classic poetry that crucial words is needed is directly inputted through network, the ability composed poem is obtained using the empirical learning of forefathers, meets
Also there is aesthetic while poem rule.
In order to achieve the above objectives, the specific technical solution that the present invention uses is as follows:
A kind of composing poem system automatically based on classic poetry corpus vectorization, including corpus processing mechanism, corpus vector library,
LSTM network model, poem sorting mechanism;
The corpus processing mechanism is used to convert the operation of corpus vector and corpus vector;
Corpus vector library is for storing corpus vector;
The LSTM network model is for generating poem rough draft;
The poem sorting mechanism is used to handle the rhymed and level and oblique tone operation of poem rough draft;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem
Word sorting mechanism is sequentially connected with.
By above-mentioned design, system of composing poem automatically learns the language in poem between each words by LSTM network model
Relationship and habit, when needing to compose poem, input keyword arrives system, and corpus processing mechanism can be by after identifying keyword and decomposition
LSTM network model makes poem rough draft, then meets content rhymed, that level and oblique tone is regular as last through the selection of poem sorting mechanism
Determining poem original text, finally obtain compose poem as a result, and LSTM network model learn when need to be by historical outstanding poem through corpus
Processing mechanism processing is corpus vector and is stored in corpus vector library, the term habit of a large amount of outstanding poems is obtained, to make
The artistic poem of more excellent tool.
It further describes, the LSTM network model is the network model of two layers serial of LSTM structure composition, described
The majorized function of LSTM network model is to calculate stochastic gradient descent, and loss function is to calculate cross entropy.
The network model of two layers serial of LSTM structure composition can more accurately identify the relationship between words, but due to
The data volume that it is obtained is bigger, can suitably reduce the data volume of its calculated result.
Preferably, the 20% of discarding total data, learning rate 0.01, the number of iterations after the LSTM network model calculates
It is 700.
A kind of method of composing poem automatically based on classic poetry corpus vectorization, using following steps:
S1, input classic poetry to corpus processing mechanism, the corpus processing mechanism by the word of classic poetry be converted into corpus to
Amount, and the corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4, input image word to corpus processing mechanism, the corpus processing mechanism is according to corresponding every in corpus vector library
Poem alternative word is calculated in the corpus vector of a image word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism are chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc. and are best suitable for the rule in poem rough draft
Poem, obtains final version poem, and the final version poem is as composed poem result automatically.
By above-mentioned design, a large amount of outstanding classic poetries of input enter corpus processing mechanism, by by the every of every first poem
A word vectorization processing, such as uses skip-gram model, corpus vector is obtained, to enable computer that can identify each word
Related content, LSTM network model are capable of handling the connection relationship between text, and the meaning for understanding each word is reached with this and is divided
The purpose of relationship between word and word is analysed, the training process of LSTM network model is to learn the process of word, described when training completion
LSTM network model can simply compose poem, and reprocess the content that rhymed and level and oblique tone isotactic restrains property after making poem rough draft, finally
Poem is completed to finalize a text.
It further describes, the particular content of step S1 is as follows:
S1.1, to corpus processing mechanism, the corpus processing mechanism is split in classic poetry each of input classic poetry to occur
Word is denoted as m unduplicated words, wherein occurring being denoted as the same unduplicated word greater than the primary same word;
S1.2, the context for counting the frequency of occurrence of each unduplicated word and its occurring in every first poem are adjacent
Word;
S1.3, the corpus processing mechanism are that random n-dimensional vector is arranged in each unduplicated word, which is
The corpus vector of the unduplicated word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is whole
Number;
S1.4 constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, and each endpoint node is equal
For the child node of intermediate node, each intermediate node only has 2 child nodes, and each endpoint node is respectively directed to corpus vector
The corpus vector of a unduplicated word in library, and remember that the nodal value of the endpoint node is the occurrence out of corresponding unduplicated word
Number, each intermediate node note nodal value are the nodal value summation of its child node, and the bigger endpoint node of nodal value is more from root node
Closely, the root node is the maximum intermediate node of nodal value;
S1.5, the selection probability of the adjacent word of the context of corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method repeatedly to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each in corpus vector library
Corpus vector x is all updated primary, obtains new corpus vector library.
By above-mentioned design, the n-dimensional vector of each unduplicated word is initially and is randomly provided, but by right after derivative operation
Content in the corpus vector answered just with Huffman tree corresponds, i.e., each corpus vector contains it in the big of input
The information of frequency, its adjacent word in every first poem that amount classic poetry occurs, the value of n is bigger, then corresponding information is got over
It is abundant, but calculation amount is then corresponding bigger, to x, θiThe path in Huffman can more accurately be recorded by calculating local derviation, be allowed to count
It calculates more accurate.
It is further described, the set for the corpus vector composition that the corpus training set is in corpus vector library 80%, and
Corpus vector in the corpus training set sorts according to the word order in corresponding classic poetry;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is used for
The parameter setting of training adjustment LSTM network model, verifying corpus proofread trained LSTM network mould adjusted for verifying
Type.
The data in corpus vector library need to be divided into training corpus and verifying corpus, first input training corpus when training
It practises, verifying corpus verifying learning effect is inputted after study again, until the learning effect reached.
It is further described, the image word of the step S4 is the figure that input picture is obtained to image characteristics extraction model
As image word, the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract meaning from image
As word;
S4.2, the corpus processing mechanism be the image word match in corpus vector library one by one corresponding corpus to
Amount, which is poem alternative word.
Image word can be manually entered crucial words, and corpus processing mechanism identifies Corresponding matching after these crucial words again
The corpus vector in corpus vector library can also separately set image characteristics extraction model, which can extract image
In scene and be converted into words, only need input picture that the crucial scape in image can be obtained to image characteristics extraction model at this time
The words of object, corpus processing mechanism are again handled the words of extraction, obtain poem alternative word.
It being further described, described image Feature Selection Model is improved VGG-16 convolutional neural networks model, including
Sequentially connected convolutional layer group 1, pond layer, convolutional layer group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, Chi Hua
Layer, convolutional layer group 5, pond layer, 2 convolutional layers, Bounding-box layers and Softmax layers, wherein the convolutional layer group 1, volume
The convolutional layer that lamination group 2 is concatenated by 2 forms, and the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are concatenated by 3
Convolutional layer composition, and each convolutional layer be all connected with it is Bounding-box layers described.
Traditional VGG-16 convolutional neural networks structure is sequentially connected 2 convolutional layers, pond layer, 2 convolutional layers, ponds
Change layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 full articulamentums, Softmax
Layer, and 3 full articulamentums are adjusted to 2 on the basis of traditional VGG-16 by above-mentioned improved VGG-16 convolutional neural networks model
A convolutional layer and Bounding-box layers, and each convolutional layer is enabled to be directly connected to Bounding-box layers, form full convolution net
Network, and by the parameter of the Bounding-box layers of each convolutional layer of adjusting, in addition, more scenery need to be extracted when input picture is larger
When can be corresponded to before Bounding-box layers be added convolutional layer.
Be further described, the image word of the step S4 is one words A of input, the corpus processing mechanism according to
Subsequent association word is calculated in words A association, and words A and subsequent association word form word string, and the word string is that poem is standby
Select word;
The method of the calculated for subsequent conjunctive word is to be found out in corpus vector library to match according to the corpus vector of previous words
Highest next words is spent, the calculating of matching degree is as follows:
Wherein, a is the corpus vector of previous words, and b is the corpus vector of any words in corpus vector library, then meets
Words corresponding to the maximum corpus vector b of cos (a, b) is next words.
Input image word can also be realized by aforesaid way, i.e., first input a words A, then pass through corpus processing mechanism meter
Calculate corpus vector library in the highest words B of words A matching degree, then calculate with the highest words C of words B matching degree, with this
Analogize, finally obtains several words composition word strings matched, word string input LSTM network model is then obtained into poem.Which
A prompt word is only provided, subsequent content is calculated by machine Auto-matching completely.
It is further described, image word also is established into the same or similar words classification of meaning after step S1 input classic poetry
It composes, the image word and its same or similar word of meaning in image collection of tunes of poems that poem alternative word described in step S4 includes input
Word.
It, may phase not to the utmost to the description language of same thing in different poems due to the word ambiguity and near synonym factor of Chinese character
Together, then the same or similar words of meaning can be formed verse that is a kind of, and working as the word composition of input by designing image collection of tunes of poems
When lacking aesthetic feeling, it can correspond to and adjust the word, adjustment mode is to choose from the same or similar a kind of words of meaning.
Be further described, the style of a verse, poem, etc. composed poem automatically be seven-character octave, level and oblique tone rule are as follows: " in put down in it is narrow it is average-,
In it is narrow it is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow average narrow
Flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-." or " in it is narrow it is average it is narrow it is flat-, in put down in it is narrow it is average-.In put down in
It is narrow average narrow, in it is narrow it is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow flat
Level and oblique tone it is narrow it is flat-.";
Wherein, put down-indicate that the rhyme of the word is flat or narrow;
Then in step 6 level and oblique tone rule choosing method are as follows: the poem sorting mechanism word for word compares poem rough draft and level and oblique tone
Whether rule is consistent, if inconsistent, replaces in image collection of tunes of poems the same or similar word of meaning simultaneously for inconsistent word is corresponding
Again level and oblique tone rule is compared, until poem rough draft and level and oblique tone rule are completely the same.
Beneficial effects of the present invention:
1, connection of the Recognition with Recurrent Neural Network based on human brain feature and neuron, for natural language study very close to people
Study of the class to natural language, therefore after introducing LSTM network model, after learning above the corpus of big data, machine can be with
A preferable generation model is obtained, to handle the logic of poem, poem and image relationship.
2, convolutional neural networks are had outstanding performance in the identification to object, and it is special to extract most of scenery that we need
Sign, also provides keyword abundant and image theme for poetry creation.
3, because term vector is calculated by the word frequency and its co-occurrence of poem corpus, and the co-occurrence of word and word
Also the relationship between word and word has been reacted, therefore it is remote with the relationship of word to reflect word by the cosine that term vector calculates its vector
Closely, therefore, rhythm word, the replacement of level and oblique tone can be carried out with the method, the extension etc. of word cloud uses word classification chart in conjunction with classic poetry,
Implement also convenient and efficient.
4, image collection of tunes of poems of poem can be used for the keyword input step of machine generation, and be expanded using image collection of tunes of poems
Exhibition, to solve the problems, such as most machine to compose poem, the theme occurred in system is inconsistent and random skip.
5, the introduction, elucidation of the theme of poem is a big feature and the human brain thinking process in composing poem, and the present invention uses word string technology,
It allows the mode of thinking of the machine simulation mankind, is the beneficial practice in cognitive engineering, can realize that the mankind write to a certain extent
Mechanism is composed poem the intelligence of the artistic creation in task in machine.
Detailed description of the invention
Fig. 1 is system structure diagram of the invention;
Fig. 2 is the structural schematic diagram of the LSTM network model of embodiment;
Fig. 3 is flow chart of the method for the present invention;
Fig. 4 is the detail flowchart of step S1;
Fig. 5 is the Huffman schematic diagram of embodiment;
Fig. 6 is improved VGG-16 convolutional neural networks model structure schematic diagram of the invention;
Fig. 7 is the improved VGG-16 convolutional neural networks model structure schematic diagram of embodiment.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in Figure 1, a kind of compose poem automatically system, including corpus processing mechanism, language based on classic poetry corpus vectorization
Expect vector library, LSTM network model, poem sorting mechanism;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem
Word sorting mechanism is sequentially connected with.
LSTM network model described in the present embodiment is preferably the network model of two layers serial of LSTM structure composition, such as Fig. 2
Shown, two dotted line frames respectively indicate one layer of LSTM structure, each a in figure up and down in figurei,jIt indicates a neuron, inputs
X1、X2For the corpus vector of two words sequentially connected in classic poetry, the h of output is the connection relationship of two words, i.e., often
The connected word of 1 two word is inputted, LSTM network model can learn word relationship therein;
Preferably, the majorized function of the LSTM network model is to calculate stochastic gradient descent, and loss function is to calculate to hand over
Entropy is pitched, the 20% of discarding total data, learning rate 0.01, the number of iterations 700 after the LSTM network model calculates.
As shown in figure 3, a kind of method of composing poem automatically based on classic poetry corpus vectorization, using following steps:
S1, input classic poetry to corpus processing mechanism, the corpus processing mechanism by the word of classic poetry be converted into corpus to
Amount, and the corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4, input image word to corpus processing mechanism, the corpus processing mechanism is according to corresponding every in corpus vector library
Poem alternative word is calculated in the corpus vector of a image word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism are chosen according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc. and are best suitable for the rule in poem rough draft
Poem, obtains final version poem, and the final version poem is as composed poem result automatically.
Wherein, the particular content of step S1 is as shown in Figure 4:
S1.1, to corpus processing mechanism, the corpus processing mechanism is split in classic poetry each of input classic poetry to occur
Word is denoted as m unduplicated words, wherein occurring being denoted as the same unduplicated word greater than the primary same word;
S1.2, the context for counting the frequency of occurrence of each unduplicated word and its occurring in every first poem are adjacent
Word;
S1.3, the corpus processing mechanism are that random n-dimensional vector is arranged in each unduplicated word, which is
The corpus vector of the unduplicated word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is whole
Number, the preferred n=200 of the present embodiment;
S1.4 constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, and each endpoint node is equal
For the child node of intermediate node, each intermediate node only has 2 child nodes, and each endpoint node is respectively directed to corpus vector
The corpus vector of a unduplicated word in library, and remember that the nodal value of the endpoint node is the occurrence out of corresponding unduplicated word
Number, each intermediate node note nodal value are the nodal value summation of its child node, and the bigger endpoint node of nodal value is more from root node
Closely, the root node is the maximum intermediate node of nodal value;
Preferably, 3,000 two seven-word poems of the present embodiment selected parts: are flown down straightly " in li po's " hoping Mount Lushan waterfall "
Ruler, as if the Silver River were falling from Heaven." and Tu Fu's " poem of four lines " in " ridge a Chuan Hanxi eternal lasting snow, the inner ship of Men Bo Wu ten thousand." established with this
Huffman tree, wherein " thousand " word occurs 2 times, remaining word only occurs 1 time, and therefore the endpoint node of " thousand " word is than remaining word
Endpoint node is closer from root node, and the nodal value of " thousand " word is 2 simultaneously, and remaining word is 1, is ultimately formed as shown in Figure 5
Huffman tree.
S1.5, the selection probability of the adjacent word of the context of corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method repeatedly to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each in corpus vector library
Corpus vector x is all updated primary, obtains new corpus vector library.
The set for the corpus vector composition that corpus training set used by the present embodiment is in corpus vector library 80%, and should
Corpus vector in corpus training set sorts according to the word order in corresponding classic poetry;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is used for
The parameter setting of training adjustment LSTM network model, verifying corpus proofread trained LSTM network mould adjusted for verifying
Type.
The present embodiment in such a way that input picture is composed poem, i.e., the image word of the described step S4 be input picture extremely
The image image word that image characteristics extraction model obtains, the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract meaning from image
As word;
S4.2, the corpus processing mechanism be the image word match in corpus vector library one by one corresponding corpus to
Amount, which is poem alternative word.
As shown in fig. 6, described image Feature Selection Model is improved VGG-16 convolutional neural networks model, including successively
The convolutional layer group 1 of connection, pond layer (Pool), convolutional layer group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, Chi Hua
Layer, convolutional layer group 5, pond layer, 2 convolutional layers, Bounding-box layers and Softmax layers, wherein the convolutional layer group 1, volume
The convolutional layer (Conv) that lamination group 2 is concatenated by 2 forms, and the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are by 3
The convolutional layer of concatenation forms, and each convolutional layer be all connected with it is Bounding-box layers described.
Preferably improved VGG-16 convolutional neural networks model is structure shown in Fig. 7, dotted line part in figure to the present embodiment
It is divided into traditional VGG-16 convolutional neural networks structure conventional part, i.e., sequentially connected 2 convolutional layers, pond layer, 2 convolution
Layer, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, 3 convolutional layers, pond layer, have been sequentially connected 6 later
A convolutional layer finally connects Bounding-box layers and Softmax layers, and compared with the structure of Fig. 4, the structure of Fig. 5 exists
4 convolutional layers are increased before Bounding-box layers, to obtain the feature of more images, the convolution kernel of each convolutional layer is
3 × 3, pond layer is 2 × 2.
Embodiment two: the image word of the step S4 is one words A of input, and the corpus processing mechanism is according to the word
Subsequent association word is calculated in word A association, and words A and subsequent association word form word string, and the word string is poem alternative word;
The method of the calculated for subsequent conjunctive word is to be found out in corpus vector library to match according to the corpus vector of previous words
Highest next words is spent, the calculating of matching degree is as follows:
Wherein, a is the corpus vector of previous words, and b is the corpus vector of any words in corpus vector library, then meets
Words corresponding to the maximum corpus vector b of cos (a, b) is next words.
Also the same or similar words of meaning is classified after step S1 input classic poetry and establishes image collection of tunes of poems, described in step S4
Poem alternative word include input image word and its same or similar words of meaning in image collection of tunes of poems.
Automatically the style of a verse, poem, etc. composed poem be seven-character octave, level and oblique tone rule are as follows: " in put down in it is narrow it is average-, in it is narrow average narrow
Flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow it is average it is narrow it is flat-.In it is narrow in it is average
It is narrow, in put down in it is narrow it is average-." or " in it is narrow it is average it is narrow it is flat-, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow
It is average it is narrow it is flat-.In it is narrow in it is average narrow, in put down in it is narrow it is average-.In put down in it is narrow average narrow, in it is narrow it is average it is narrow it is flat-.";
Wherein, put down-indicate that the rhyme of the word is flat or narrow;
Then in step 6 level and oblique tone rule choosing method are as follows: the poem sorting mechanism word for word compares poem rough draft and level and oblique tone
Whether rule is consistent, if inconsistent, replaces in image collection of tunes of poems the same or similar word of meaning simultaneously for inconsistent word is corresponding
Again level and oblique tone rule is compared, until poem rough draft and level and oblique tone rule are completely the same.
Claims (10)
1. a kind of system of composing poem automatically based on classic poetry corpus vectorization, it is characterised in that: including corpus processing mechanism, corpus
Vector library, LSTM network model, poem sorting mechanism;
The corpus processing mechanism is used to convert the operation of corpus vector and corpus vector;
Corpus vector library is for storing corpus vector;
The LSTM network model is for generating poem rough draft;
The poem sorting mechanism is used to handle the rhymed and level and oblique tone operation of poem rough draft;
The corpus processing mechanism is bi-directionally connected with corpus vector library, the corpus processing mechanism, LSTM network model, poem sieve
Mechanism is selected to be sequentially connected with.
2. the system of composing poem automatically based on classic poetry corpus vectorization according to claim 1, it is characterised in that: the LSTM
Network model is the network model of two layers serial of LSTM structure composition, the majorized function of the LSTM network model be calculate with
The decline of machine gradient, loss function are to calculate cross entropy.
3. the system of composing poem automatically based on classic poetry corpus vectorization according to claim 2, it is characterised in that: the LSTM
The 20% of discarding total data, learning rate 0.01, the number of iterations 700 after network model calculates.
4. a kind of method of composing poem automatically based on classic poetry corpus vectorization, it is characterised in that use following steps:
S1 inputs classic poetry to corpus processing mechanism, and the word of classic poetry is converted corpus vector by the corpus processing mechanism, and
The corpus vector is stored in corpus vector library;
S2 builds LSTM network model;
S3, input corpus training set to LSTM network model complete the training to LSTM network model;
S4 inputs image word to corpus processing mechanism, and the corpus processing mechanism corresponds to each meaning according in corpus vector library
As poem alternative word is calculated in the corpus vector of word;
Poem alternative word is inputted LSTM network model by S5, the corpus processing mechanism, obtains poem rough draft;
S6, poem sorting mechanism choose the poem that the rule is best suitable in poem rough draft according to the rhymed and level and oblique tone rule of the style of a verse, poem, etc.
Word, obtains final version poem, and the final version poem is as composed poem result automatically.
5. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that step S1's
Particular content is as follows:
S1.1, input classic poetry to corpus processing mechanism, the corpus processing mechanism split each word occurred in classic poetry, note
For m unduplicated words, the same unduplicated word is denoted as wherein occurring being greater than the primary same word;
S1.2 counts the frequency of occurrence of each unduplicated word and its word that the context that occurs in every first poem is adjacent;
S1.3, the corpus processing mechanism be each unduplicated word random n-dimensional vector is set, the n-dimensional vector be this not
The corpus vector of duplicate word, by the corresponding deposit corpus vector library of the corpus vector, n ∈ [180,220], and n is integer;
S1.4, constructs Huffman tree, and the Huffman tree includes endpoint node and intermediate node, during each endpoint node is
The child node of intermediate node, each intermediate node only have 2 child nodes, and each endpoint node is respectively directed in corpus vector library
The corpus vector of one unduplicated word, and remember that the nodal value of the endpoint node is the frequency of occurrence of corresponding unduplicated word, often
A intermediate node note nodal value is the nodal value summation of its child node, and the bigger endpoint node of nodal value is closer from root node, institute
Stating root node is the maximum intermediate node of nodal value;
S1.5, the selection probability of the adjacent word of the context of any corpus vector x on the Huffman tree are as follows:
P (context | x)=Π pi
Wherein, piThe probability of its first child node is chosen for upper i-th of the intermediate node of Huffman tree:
X is the corpus vector of intermediate node input, θiWeight for the corpus vector inputted on i-th of intermediate node;
S1.6, using gradient descent method to x, θiLocal derviation is sought respectively:
First calculate θiLocal derviation:
By new θiIt is corresponding to be updated to the local derviation for calculating x after p (context | x) again:
Corresponding update of new x is arrived into corpus vector library;
S1.7 chooses a corpus vector x not updated and returns to step S1.5, again until each corpus in corpus vector library
Vector x is all updated primary, obtains new corpus vector library.
6. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the corpus
The set for the corpus vector composition that training set is in corpus vector library 80%, and the corpus vector in the corpus training set is according to right
The word order in classic poetry is answered to sort;
The corpus training set is training corpus and verifying corpus according to the ratio cut partition of 9:1, and wherein training corpus is for training
The parameter setting of LSTM network model is adjusted, verifying corpus proofreads trained LSTM network model adjusted for verifying.
7. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the step
The image word of S4 is the image image word that input picture to image characteristics extraction model obtains, and the specific method is as follows:
S4.1, input picture to image characteristics extraction model, described image Feature Selection Model extract image word from image
Language;
S4.2, the corpus processing mechanism are that the image word matches corresponding corpus vector in corpus vector library one by one, should
Corpus vector is poem alternative word.
8. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 7, it is characterised in that: described image
Feature Selection Model is improved VGG-16 convolutional neural networks model, including sequentially connected convolutional layer group 1, pond layer, volume
Lamination group 2, pond layer, convolutional layer group 3, pond layer, convolutional layer group 4, pond layer, convolutional layer group 5, pond layer, 2 convolutional layers,
Bounding-box layers and Softmax layers, wherein the convolutional layer group that the convolutional layer group 1, convolutional layer group 2 are concatenated by 2
At the convolutional layer that the convolutional layer group 3, convolutional layer group 4, convolutional layer group 5 are concatenated by 3 forms, and each convolutional layer connects
It connects Bounding-box layers described.
9. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: the step
The image word of S4 is one words A of input, and subsequent association is calculated according to words A association in the corpus processing mechanism
Word, words A and subsequent association word form word string, and the word string is poem alternative word;
The method of the calculated for subsequent conjunctive word is that matching degree is found out in corpus vector library according to the corpus vector of previous words most
High next words, the calculating of matching degree are as follows:
Wherein, a be previous words corpus vector, b be corpus vector library in any words corpus vector, then meet cos (a,
B) words corresponding to maximum corpus vector b is next words.
10. the method for composing poem automatically based on classic poetry corpus vectorization according to claim 4, it is characterised in that: step S1
Image collection of tunes of poems also is established into the same or similar words classification of meaning after input classic poetry, poem alternative word packet described in step S4
The image word and its same or similar words of meaning in image collection of tunes of poems for including input.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810817519.7A CN109086270B (en) | 2018-07-24 | 2018-07-24 | Automatic poetry making system and method based on ancient poetry corpus vectorization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810817519.7A CN109086270B (en) | 2018-07-24 | 2018-07-24 | Automatic poetry making system and method based on ancient poetry corpus vectorization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086270A true CN109086270A (en) | 2018-12-25 |
CN109086270B CN109086270B (en) | 2022-03-01 |
Family
ID=64838256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810817519.7A Active CN109086270B (en) | 2018-07-24 | 2018-07-24 | Automatic poetry making system and method based on ancient poetry corpus vectorization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086270B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309510A (en) * | 2019-07-02 | 2019-10-08 | 中国计量大学 | It is a kind of that picture poem inscribed on a scroll method is seen based on C-S and GRU |
CN110738061A (en) * | 2019-10-17 | 2020-01-31 | 北京搜狐互联网信息服务有限公司 | Ancient poetry generation method, device and equipment and storage medium |
CN111814488A (en) * | 2020-07-22 | 2020-10-23 | 网易(杭州)网络有限公司 | Poetry generation method and device, electronic equipment and readable storage medium |
CN112101006A (en) * | 2020-09-14 | 2020-12-18 | 中国平安人寿保险股份有限公司 | Poetry generation method and device, computer equipment and storage medium |
CN112257775A (en) * | 2020-10-21 | 2021-01-22 | 东南大学 | Poetry method by graph based on convolutional neural network and unsupervised language model |
CN112434145A (en) * | 2020-11-25 | 2021-03-02 | 天津大学 | Picture-viewing poetry method based on image recognition and natural language processing |
CN112883710A (en) * | 2021-01-13 | 2021-06-01 | 戴宇航 | Method for optimizing poems authored by user |
CN113051877A (en) * | 2021-03-11 | 2021-06-29 | 杨虡 | Text content generation method and device, electronic equipment and storage medium |
CN113553822A (en) * | 2021-07-30 | 2021-10-26 | 网易(杭州)网络有限公司 | Ancient poetry generation model training method, ancient poetry generation equipment and storage medium |
CN116070643A (en) * | 2023-04-03 | 2023-05-05 | 武昌理工学院 | Fixed style translation method and system from ancient text to English |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1889366A (en) * | 2006-07-13 | 2007-01-03 | 浙江大学 | Hafman decoding method |
CN104951554A (en) * | 2015-06-29 | 2015-09-30 | 浙江大学 | Method for matching landscape with verses according with artistic conception of landscape |
CN105930318A (en) * | 2016-04-11 | 2016-09-07 | 深圳大学 | Word vector training method and system |
CN105955964A (en) * | 2016-06-13 | 2016-09-21 | 北京百度网讯科技有限公司 | Method and apparatus for automatically generating poem |
CN106569995A (en) * | 2016-09-26 | 2017-04-19 | 天津大学 | Method for automatically generating Chinese poetry based on corpus and metrical rule |
CN107102981A (en) * | 2016-02-19 | 2017-08-29 | 腾讯科技(深圳)有限公司 | Term vector generation method and device |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107480132A (en) * | 2017-07-25 | 2017-12-15 | 浙江工业大学 | A kind of classic poetry generation method of image content-based |
CN107832292A (en) * | 2017-11-02 | 2018-03-23 | 合肥工业大学 | A kind of conversion method based on the image of neural network model to Chinese ancient poetry |
US20180190249A1 (en) * | 2016-12-30 | 2018-07-05 | Google Inc. | Machine Learning to Generate Music from Text |
-
2018
- 2018-07-24 CN CN201810817519.7A patent/CN109086270B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1889366A (en) * | 2006-07-13 | 2007-01-03 | 浙江大学 | Hafman decoding method |
CN104951554A (en) * | 2015-06-29 | 2015-09-30 | 浙江大学 | Method for matching landscape with verses according with artistic conception of landscape |
CN107102981A (en) * | 2016-02-19 | 2017-08-29 | 腾讯科技(深圳)有限公司 | Term vector generation method and device |
CN105930318A (en) * | 2016-04-11 | 2016-09-07 | 深圳大学 | Word vector training method and system |
CN105955964A (en) * | 2016-06-13 | 2016-09-21 | 北京百度网讯科技有限公司 | Method and apparatus for automatically generating poem |
CN106569995A (en) * | 2016-09-26 | 2017-04-19 | 天津大学 | Method for automatically generating Chinese poetry based on corpus and metrical rule |
US20180190249A1 (en) * | 2016-12-30 | 2018-07-05 | Google Inc. | Machine Learning to Generate Music from Text |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN107480132A (en) * | 2017-07-25 | 2017-12-15 | 浙江工业大学 | A kind of classic poetry generation method of image content-based |
CN107832292A (en) * | 2017-11-02 | 2018-03-23 | 合肥工业大学 | A kind of conversion method based on the image of neural network model to Chinese ancient poetry |
Non-Patent Citations (4)
Title |
---|
QIXINWANG ET.AL: "Chinese Song Iambics Generation with Neural Attention-based Model", 《ARXIV:1604.06274V2 [CS.CL]》 * |
YULIA TSVETKOV ET.AL: "Evaluation ofWord Vector Representations by Subspace Alignment", 《PROCEEDINGS OF THE 2015 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 * |
周昌乐等: "中国古典诗词楹联的计算化研究", 《心智与计算》 * |
苏劲松等: "基于统计抽词和格律的全宋词切分语料库建立", 《中文信息学报》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309510A (en) * | 2019-07-02 | 2019-10-08 | 中国计量大学 | It is a kind of that picture poem inscribed on a scroll method is seen based on C-S and GRU |
CN110738061B (en) * | 2019-10-17 | 2024-05-28 | 北京搜狐互联网信息服务有限公司 | Ancient poetry generating method, device, equipment and storage medium |
CN110738061A (en) * | 2019-10-17 | 2020-01-31 | 北京搜狐互联网信息服务有限公司 | Ancient poetry generation method, device and equipment and storage medium |
CN111814488A (en) * | 2020-07-22 | 2020-10-23 | 网易(杭州)网络有限公司 | Poetry generation method and device, electronic equipment and readable storage medium |
CN111814488B (en) * | 2020-07-22 | 2024-06-07 | 网易(杭州)网络有限公司 | Poem generation method and device, electronic equipment and readable storage medium |
CN112101006A (en) * | 2020-09-14 | 2020-12-18 | 中国平安人寿保险股份有限公司 | Poetry generation method and device, computer equipment and storage medium |
CN112257775A (en) * | 2020-10-21 | 2021-01-22 | 东南大学 | Poetry method by graph based on convolutional neural network and unsupervised language model |
CN112434145A (en) * | 2020-11-25 | 2021-03-02 | 天津大学 | Picture-viewing poetry method based on image recognition and natural language processing |
CN112883710A (en) * | 2021-01-13 | 2021-06-01 | 戴宇航 | Method for optimizing poems authored by user |
CN113051877A (en) * | 2021-03-11 | 2021-06-29 | 杨虡 | Text content generation method and device, electronic equipment and storage medium |
CN113553822B (en) * | 2021-07-30 | 2023-06-30 | 网易(杭州)网络有限公司 | Ancient poetry generating model training, ancient poetry generating method, equipment and storage medium |
CN113553822A (en) * | 2021-07-30 | 2021-10-26 | 网易(杭州)网络有限公司 | Ancient poetry generation model training method, ancient poetry generation equipment and storage medium |
CN116070643A (en) * | 2023-04-03 | 2023-05-05 | 武昌理工学院 | Fixed style translation method and system from ancient text to English |
CN116070643B (en) * | 2023-04-03 | 2023-08-15 | 武昌理工学院 | Fixed style translation method and system from ancient text to English |
Also Published As
Publication number | Publication date |
---|---|
CN109086270B (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086270A (en) | System and method of composing poem automatically based on classic poetry corpus vectorization | |
CN109902171B (en) | Text relation extraction method and system based on hierarchical knowledge graph attention model | |
CN108280064B (en) | Combined processing method for word segmentation, part of speech tagging, entity recognition and syntactic analysis | |
CN107291693B (en) | Semantic calculation method for improved word vector model | |
CN106569995B (en) | Chinese ancient poetry word automatic generation method based on corpus and rules and forms rule | |
CN108415977A (en) | One is read understanding method based on the production machine of deep neural network and intensified learning | |
CN105631468B (en) | A kind of picture based on RNN describes automatic generation method | |
CN108710680A (en) | It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning | |
CN107273355A (en) | A kind of Chinese word vector generation method based on words joint training | |
CN108153864A (en) | Method based on neural network generation text snippet | |
CN108647191A (en) | It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method | |
CN109271628A (en) | A kind of iamge description generation method | |
CN109960747A (en) | The generation method of video presentation information, method for processing video frequency, corresponding device | |
CN107766320A (en) | A kind of Chinese pronoun resolution method for establishing model and device | |
CN106502979A (en) | A kind of data processing method of natural language information and device | |
CN112069199A (en) | Multi-round natural language SQL conversion method based on intermediate syntax tree | |
CN113342933B (en) | Multi-feature interactive network recruitment text classification method similar to double-tower model | |
CN108038104A (en) | A kind of method and device of Entity recognition | |
CN110059220A (en) | A kind of film recommended method based on deep learning Yu Bayesian probability matrix decomposition | |
CN107679225A (en) | A kind of reply generation method based on keyword | |
CN110047462A (en) | A kind of phoneme synthesizing method, device and electronic equipment | |
CN111914555A (en) | Automatic relation extraction system based on Transformer structure | |
CN108563637A (en) | A kind of sentence entity complementing method of fusion triple knowledge base | |
CN108519976A (en) | The method for generating extensive sentiment dictionary based on neural network | |
CN114220095A (en) | Image semantic description improvement method based on instance segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |