CN110134960A - A kind of generation method and relevant device of text - Google Patents

A kind of generation method and relevant device of text Download PDF

Info

Publication number
CN110134960A
CN110134960A CN201910409516.4A CN201910409516A CN110134960A CN 110134960 A CN110134960 A CN 110134960A CN 201910409516 A CN201910409516 A CN 201910409516A CN 110134960 A CN110134960 A CN 110134960A
Authority
CN
China
Prior art keywords
sentence
text
coding
requirement
special format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910409516.4A
Other languages
Chinese (zh)
Inventor
王亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910409516.4A priority Critical patent/CN110134960A/en
Publication of CN110134960A publication Critical patent/CN110134960A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention discloses a kind of generation method of text and relevant device, the text for meeting specific format requirement can be generated.The method comprise the steps that obtaining target code, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;The target code is inputted into preset model, to export the corresponding target sequence coding of the target code, the preset model is trained to obtain by object module to training data, the training data includes the sentence coding of sentence in each text and each text for meet in corpus the special format requirement, and the sentence coding of sentence meets the special format requirement in each text;The text for meeting special format requirement is generated according to target sequence coding.

Description

A kind of generation method and relevant device of text
Technical field
The present invention relates to natural language processing field, in particular to the generation method and relevant device of a kind of text.
Background technique
With the development of NLP (Natural Language Processing, natural language processing) technology, AI (Artificial Intelligence, artificial intelligence) writes the poem, AI commodity brief introduction, and the text generations such as AI automatic lyrics application is Begun to appear in major website platform.AI text generation can effectively save human cost, meet the real-time life of mass data It at requiring, and result multiplicity, is able to achieve personalized private customized, thus there is huge application potential.
The existing method by " while generating current sentence text, also predicting the keyword of lower a word ", generates Keyword sequence, to keep the logic continuity between simple sentence and simple sentence.
Although logic is coherent between the simple sentence and simple sentence of generation, it is unsatisfactory for some specific requirements of specific format text. It rhymes between even number sentence as poem, the lyrics usually require that;The lyrics even have length requirement, to meet specific rhythm cadence.
Summary of the invention
The embodiment of the invention provides a kind of generation method of text and relevant devices, patrol between simple sentence for generating to meet Collect the text that specific format requirement is also able to satisfy except linking up.
First aspect of the embodiment of the present invention provides a kind of generation method of text, comprising:
Target code is obtained, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;
The target code is inputted into preset model, it is described to export the corresponding target sequence coding of the target code Preset model is trained to obtain by object module to training data, and the training data includes meeting institute in corpus The sentence coding of sentence in each text and each text of special format requirement is stated, sentence in each text Sentence coding meets the special format requirement;
The text for meeting special format requirement is generated according to target sequence coding.
Optionally, the object module is Recognition with Recurrent Neural Network model, described that target code is inputted preset model, with defeated Out before the corresponding target sequence coding of the target code, the method also includes:
Obtain each text;
It requires to encode the sentence in each text respectively according to the special format, it is described each to obtain The sentence coding of sentence in text;
Sentence coding based on sentence in each text carries out the model parameter of the Recognition with Recurrent Neural Network model Iteration updates;
When reaching preset stopping criterion for iteration, it is up to circulation nerve when the preset stopping criterion for iteration Network model is determined as the preset model.
Optionally, the special format requires to include rhyme requirement, rhythm requirement and/or content requirement, described according to institute It states special format requirement to encode the sentence in each text respectively, to obtain the sentence of sentence in each text Son encodes
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out the sentence in each text Rhyme coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, The research content of sentence in each text is obtained, the research content is corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm coding to the sentence in each text The rhythm coding of sentence in each text is obtained, the rhythm coding is corresponding with rhythm requirement;
Sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Sentence coding.
Optionally, the acquisition target code includes:
Judge whether to receive the operational order of user;
If so, obtaining the target code from the corpus according to the operational order of the user;
If it is not, the sentence randomly selected from the corpus coding is then determined as the target code.
Optionally, the text for meeting special format requirement according to target sequence coding generation includes:
Determine that encoding identical target sentences with the sentence in target sequence coding in the corpus encodes;
The text for meeting special format requirement described in corresponding sentence generation is encoded based on the target sentences.
Second aspect of the embodiment of the present invention provides a kind of generating means of text, comprising:
Acquiring unit, for obtaining target code, the target code is the text to be generated for meeting special format requirement First sentence coding;
Processing unit, for the target code to be inputted preset model, to export the corresponding target of the target code Sequential coding, the preset model are trained to obtain by object module to training data, and the training data includes Meet the sentence coding of sentence in each text and each text of the special format requirement in corpus, it is described every The sentence coding of sentence meets the special format requirement in a text;
Generation unit, for generating the text for meeting special format requirement according to target sequence coding.
Optionally, the object module is Recognition with Recurrent Neural Network model, described device further include:
Training unit, the training unit are used for:
Obtain each text;
It requires to encode the sentence in each text respectively according to the special format, it is described each to obtain The sentence coding of sentence in text;
Sentence coding based on sentence in each text carries out the model parameter of the Recognition with Recurrent Neural Network model Iteration updates;
When reaching preset stopping criterion for iteration, it is up to circulation mind when the preset stopping criterion for iteration It is determined as the preset model through network model.
Optionally, the special format requires to include rhyme requirement, rhythm requirement and/or content requirement, and the training is single Member requires to encode the sentence in each text respectively, to obtain each text according to the special format The sentence of middle sentence encodes
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out the sentence in each text Rhyme coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, The research content of sentence in each text is obtained, the research content is corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm coding to the sentence in each text The rhythm coding of sentence in each text is obtained, the rhythm coding is corresponding with rhythm requirement;
Sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Sentence coding.
Optionally, described device further include:
Judging unit, the judging unit are used for:
Judge whether the number of iterations reaches default value, if so, determination meets the preset stopping criterion for iteration;
Or,
The judging unit is also used to:
Judge whether the model parameter of the Recognition with Recurrent Neural Network model restrains, if so, determine meet it is described preset Stopping criterion for iteration.
Optionally, the acquiring unit is specifically used for:
Judge whether to receive the operational order of user;
If so, obtaining the target code from the corpus according to the operational order of the user;
If it is not, the sentence randomly selected from the corpus coding is then determined as the target code.
Optionally, the generation unit is specifically used for:
Determine that encoding identical target sentences with the sentence in target sequence coding in the corpus encodes;
The text for meeting special format requirement described in corresponding sentence generation is encoded based on the target sentences.
The third aspect of the embodiment of the present invention provides a kind of computer readable storage medium, including instruction, when it is being calculated When being run on machine, so that the step of computer executes the generation method of text described above.
Fourth aspect of the embodiment of the present invention provides a kind of computer program product comprising instruction, when its on computers When operation, so that the step of computer executes the generation method of text described above.
In view of the foregoing it is apparent that in embodiment provided by the invention, it can be by the way that target code be inputted default mould Type inputs the corresponding target sequence coding of the target code, which is the text to be generated for meeting special format requirement First sentence coding, generated according to target sequence coding meet the text of special format requirement later.Since format constraints being encoded To in the process of text generation, so that the text generated meets specific call format.
Detailed description of the invention
Fig. 1 is the flow diagram of the generation method of text provided in an embodiment of the present invention;
Fig. 2 is the training flow diagram of preset model provided in an embodiment of the present invention;
Fig. 3 is the virtual architecture schematic diagram of the generating means of text provided in an embodiment of the present invention;
Fig. 4 is the hardware structural diagram of the generating means of text provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the invention provides a kind of generation method of text and relevant devices, meet particular requirement for generating Text.
Description and claims of this specification and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiments described herein can be in addition to illustrating herein Or the sequence other than the content of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.
The generation method of text of the invention is illustrated from the angle of the generating means of text below, the life of the text It can be server at device, the service unit being also possible in server, specifically without limitation.
Referring to Fig. 1, Fig. 1 is a kind of flow diagram of the generation method of text provided in an embodiment of the present invention, comprising:
101, target code is obtained.
In the present embodiment, the available target code of the generating means of text, wherein the target code is satisfaction to be generated The first sentence coding for the text that special format requires, the first sentence coding refer to the coding of first sentence of text, the special format It is required that being the call format of specific format style (such as text of the special formats such as the RAP lyrics, ancient poems, modern poetic), the spy Different call format includes rhyme requirement, rhythm requirement and/or content requirement, that is to say, that can be according to specific format text Difference, which requires to may include different requirements, such as the RAP lyrics not only require rhyme, simultaneously to rhythm It is equally required with content, and modern poetic does not all require rhythm and rhyme, only internally has requirement.Specifically, can be with Obtain target code in the following way:
Judge whether to receive the operational order of user, when receiving the operational order of user, depending on the user's operation Instruction obtains target code from corpus, such as it is specific word that user, which specifies " friendship ", then can include " friend from corpus It randomly selects to obtain a sentence coding " friendship _ 2 ou_ " in the sentence coding of feelings ", which is compiled as target Code;When not receiving the operational order of user, sentence coding can be randomly selected from corpus, and by the random choosing The sentence coding taken is determined as target code, such as randomly selects to obtain target code " i_ love _ 2 ".Certainly it can also also take Other modes obtain target code, herein by way of example only, do not represent the restriction to it.
102, target code is inputted into preset model, to export the corresponding target sequence coding of target code.
In the present embodiment, the generating means of text can train a model in advance, and the model trained in advance is as pre- If model, preset model user is encoded according to the target code of input with exporting the corresponding target sequence of target code.Namely Target code is inputted into preset model, exports second coding, the input by the second coding as preset model, output third Third is encoded the input as preset model by coding, exports the 4th coding, and so on, until end of output symbol is Only, the target code, second coding, third are encoded to ... n-th and encoded and is determined as target sequence coding.
It should be noted that the preset model is that (object module can be Recognition with Recurrent Neural Network by object module (Recurrent Neural Network, RNN) model, is also possible to other models, specifically without limitation) to training data It is trained, which includes the sentence coding of sentence in each text and each text in corpus, The sentence coding of sentence meets special format requirement in each text.That is, in each text sentence coding staff Formula is to meet the specific format requirement of text to be generated, such as the text to be generated is poem or the lyrics, then in the corpus The coding mode of sentence just needs to meet in each text of storage: rhyming, lyrics length requirement and meets between even number sentence Specific rhythm cadence etc. demand.
It is illustrated and how the sentence of text each in corpus is compiled by taking one section of RAP (Chinese musical telling) lyrics as an example below Code:
When my train that is seated comes Pekinese first day;
City is bustling with vehicles to allow me to feel very fresh;
Look at oneself that tender face in mirror;
I wants to create my a piece of day in this city.
It should be understood that the RAP lyrics need to meet following requirement compared to free text:
1, rhyme requirement: it is required that rhymed.Especially verse (main song) part, it is desirable that tail word it is rhymed as far as possible (such as it is single give as security, Double signatures or more signatures etc.), at least meet and rhymes between first sentence and even number sentence;
2, rhythm requirement: it is required that corresponding specific flow (rhythm), some basic principles include keeping odd even two as far as possible Pair (eight clap) rhythm of sentence lyrics composition is identical;Excessive variation (one verse of control does not occur for rhythm between adjacent pair Only one arrives tempo variation twice);
3, content requirement: it is required that keeping reading smoothly, and Semantic Coherence between context.
Three call formats met are needed corresponding to the above-mentioned RAP lyrics, based on three above-mentioned call formats and are passed through Following method encodes every lyrics in the RAP lyrics:
A, rhyme coding is carried out with last word simple or compound vowel of a Chinese syllable;
B, lyrics length scale of the length less than 8 is level1 after carrying out rhythm coding, such as participle with sentence length, long Degree is level2 in 8 to 15 rank, is that (wherein, different rhythm, sentence length are different to level3 greater than 15 rank , the sentence of equal length be possible to rhythm be also it is different, sentence length can react the sentence rhythm to a certain extent Variation, and cannot determine the variation of rhythm completely, therefore can obtain often by the Approximation Coding that sentence length carries out rhythm The rhythm of a sentence encodes);
C, semantic coding is carried out with keyword (such as label) (wherein, the keyword of each sentence can be mentioned by keyword Algorithm is taken to extract, such as TextRank algorithm extracts the keyword of each sentence, can also also pass through others certainly Mode, specifically without limitation, as long as the keyword of each sentence can be extracted).
By these three coding modes, so that jump while meeting rhyme, rhythm, content three aspects of the lyrics are wanted It asks.
In this way, the above-mentioned RAP lyrics will be converted into following coding:
When my train that is seated comes Pekinese first day -> train _ Beijing _ ian_2;
City is bustling with vehicles to allow me to feel very fresh -> city _ ian_1;
Look at oneself that tender face -> mirror _ face _ ian_1 in mirror;
I wants to create my a piece of day -> city _ ian_2 in this city.
It, can be by the sentence in all texts for belonging to special format requirement in corpus by using above-mentioned mode Son is encoded, later object module to the text after coding according to single text by way of be trained, finally obtain Preset model.Target code is inputted into preset model later, to export the corresponding target sequence coding of target code.
It should be noted that after requiring to be encoded according to special format by each text in corpus, it can be with The corresponding coding of sentence in each text is saved in the form of dictionary, for generating the text for meeting special format requirement The retrieval of this when.Single sentence coding is possible to that multiple texts can be corresponded to as a result, can be used for increasing generate the lyrics with Machine.
103, the text for meeting special format requirement is generated according to target sequence coding.
In the present embodiment, after obtaining target sequence coding, generation can be encoded according to target sequence and meet special lattice The text that formula requires encodes specifically, determining and encoding identical target sentences with the sentence in target sequence coding in corpus; The text that corresponding sentence generation meets special format requirement is encoded based on target sentences.Since the sentence of each text has divided It does not require to be encoded according to special format, therefore phase can be encoded with the sentence in target sequence coding from searching in corpus With target sentences coding, later by target sentences encode corresponding sentence according to target sequence encode in sentence coding it is suitable Sequence combination producing meets the text of special format requirement.Such as target sequence be encoded to [i_ love _ 2, i_ past event _ 2, i_ belong to _ 2, I_ porcelain _ 2, i_ accompany _ 2, i_ help _ 2], by target sequence encode in sentence coding with corpus in sentence coding carry out Match, obtain following each text:
How to say I still so like you (sentence be encode i_ like _ 2 corresponding sentences);
Past event does not just allow it to remember that (sentence is to encode the corresponding sentence in past event _ 2 i_ again;
Also want to allow I all belong to you (sentence is to encode i_ to belong to _ 2 corresponding sentences);
(sentence is to encode the corresponding sentence in porcelain _ 2 i_) is mixed with my porcelain;
You always invite I go to accompany you (sentence is to encode i_ to accompany _ 2 corresponding sentences);
In addition to me, who can also help oneself own (sentence is to encode i_ to help _ 2 corresponding sentences).
It should be noted that when sentence encodes corresponding multiple sentences, the text that multiple sentences can be separately constituted to User shows, is selected by users, can also directly select the most sentence of access times, can also also there is others certainly Mode, specifically without limitation.
In view of the foregoing it is apparent that in embodiment provided by the invention, it can be by the way that target code be inputted default mould Type inputs the corresponding target sequence coding of the target code, which is the text to be generated for meeting special format requirement First sentence coding, generated according to target sequence coding meet the text of special format requirement later.Since format constraints being encoded To in the process of text generation, so that the text generated meets specific call format.
It is illustrated below with reference to the training process that Fig. 2 provides preset model to the embodiment of the present invention, referring to Fig. 2, Fig. 2 For the training flow diagram of preset model provided in an embodiment of the present invention, comprising:
201, each text is obtained.
In the present embodiment, each text in corpus can be obtained first, included at least in the corpus meet it is special Multiple texts of call format, for example, the special format require be poem call format, then in the corpus include at least with The more poems that special format requires as same format;Such as the special format require be the lyrics call format, then the corpus Library includes at least the more lyrics required with special format as same format.If special format requires to want for the format of poem It asks, then including at least in the corpus with special format requirement is more poems of same format, and every poem is a text This.
202, it requires to encode the sentence in each text according to special format, to obtain sentence in each text Sentence coding.
In the present embodiment, special format requires to include rhyme requirement, rhythm requirement and/or content requirement, can be according to spy Different call format encodes the sentence in each text to obtain the sentence of sentence in each text coding.Had below Body explanation:
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out the sentence in each text Rhyme coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, The research content of sentence in each text is obtained, the research content is corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm coding to the sentence in each text The rhythm coding of sentence in each text is obtained, the rhythm coding is corresponding with rhythm requirement;
Sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Sentence coding.
In the present embodiment, the call format for including in special format requirement can be determined first, if including rhyme requirement, Rhyme coding is carried out to the sentence in each text according to the simple or compound vowel of a Chinese syllable of the last character of the sentence in each text, is obtained each The rhyme coding of sentence in text, if rhythm requires, the sentence length based on the sentence in each text is in each text Sentence carry out the rhythm coding that rhythm encodes to obtain sentence in each text, if including content requirement, based on each The target word of sentence in text carries out semantic coding to the sentence in each text, obtains the content of sentence in each text Coding determines sentence coding finally, encoding according to rhythm coding, research content and/or rhyme.That is, can be according to wanting The call format of the text of output determines different coding modes, such as the format for the text to be exported is the RAP lyrics, is just wrapped Rhyme requirement, content requirement and rhythm requirement are included, then is just needed when being encoded to the sentence in each text according to every Rhyme coding, rhythm coding and the research content of sentence in a text generate the sentence coding of each sentence, such as want defeated The format of text out is modern poetic, it is only necessary to meet content requirement, then encode to the sentence in each text When, it can be encoded the research content of the sentence as the sentence of the sentence.
It is illustrated and how the simple sentence lyrics is encoded by taking one section of RAP lyrics as an example below, it certainly can also be with It is the text of extended formatting, specifically without limitation.
When my train that is seated comes Pekinese first day;
City is bustling with vehicles to allow me to feel very fresh;
Look at oneself that tender face in mirror;
I wants to create my a piece of day in this city.
Compared to free text, the RAP lyrics need to meet following requirement:
1, rhyme requirement: it is required that rhymed.Especially verse (main song) part, it is desirable that tail word it is rhymed as far as possible (it is single give as security/it is bis- give as security/ More signatures etc.), at least meet and rhymes between first sentence and even number sentence;
2, rhythm requirement: it is required that corresponding specific flow (rhythm), some basic principles include keeping odd even two as far as possible Pair (eight clap) rhythm of sentence lyrics composition is identical;Excessive variation (one verse of control does not occur for rhythm between adjacent pair Only one arrives tempo variation twice);
3, content requirement: it is required that keeping reading smoothly, and Semantic Coherence between context.
Corresponding to three requirements that the above-mentioned RAP lyrics need to meet, the simple sentence lyrics are encoded by the following method:
A, rhyme coding is carried out with last word simple or compound vowel of a Chinese syllable;
B, lyrics length scale of the length less than 8 is level1 after carrying out rhythm coding, such as participle with sentence length, long Degree is level2 in 8 to 15 rank, is that (wherein, different rhythm, sentence length are different to level3 greater than 15 rank , the sentence of equal length be possible to rhythm be also it is different, sentence length can react the sentence rhythm to a certain extent Variation, and cannot determine the variation of rhythm completely, therefore can obtain often by the Approximation Coding that sentence length carries out rhythm The rhythm of a sentence encodes);
C, semantic coding (wherein, the target word of each lyrics is carried out with the target word of the sentence in each text It can be extracted by keyword extraction algorithm, such as TextRank algorithm extracts the target word of every lyrics, certainly It also can also in other way, specifically without limitation, as long as the target word of every lyrics can be extracted).
By these three coding modes, so that jump while meeting rhyme, rhythm, content three aspects of the lyrics are wanted It asks.
In this way, this section of RAP lyrics will be converted into following coding:
When my train that is seated comes Pekinese first day -> train _ Beijing _ ian_2;
City is bustling with vehicles to allow me to feel very fresh -> city _ ian_1;
Look at oneself that tender face -> mirror _ face _ ian_1 in mirror;
I wants to create my a piece of day -> city _ ian_2 in this city.
By using above-mentioned mode, can by the sentence in each text for meeting special format requirement in corpus by It requires to be encoded according to special format.
203, the sentence coding based on sentence in each text is iterated the model parameter of Recognition with Recurrent Neural Network model It updates.
It, can be based on sentence in each text in obtaining each text after the sentence coding of sentence in the present embodiment Sentence coding updates is iterated to the model parameter in Recognition with Recurrent Neural Network model, up to reaching preset iteration ends item Part.That is, being modeled using generating process of the Recognition with Recurrent Neural Network model to the text that special format requires, circulation mind It is fitted training data by way of maximum likelihood through network model, cooperates the distinctive modeling pattern of single sentence in each text (namely the specific coding mode for meeting special format requirement), generate meet special format requirement text sentence up and down jump by Automatically meet special format requirement.Wherein, training process uses the method similar to natural language model language model Construct training data, for<s1, s2 ..., sn>list entries, export as<s2, s3 ..., sn, EOS>, wherein s1 is every The coding of first sentence in a text, s2 are the coding of second sentence in each text, and sn is n-th in each text The coding of a sentence text, EOS are end mark.That is, Recognition with Recurrent Neural Network model is instructed to single each text In experienced process, all inputs are a sequence, and all output is a sequence, such as can be defeated by s1 input model S2 input model is exported s3 (namely last output, as the input of "current" model, to reach the effect of prediction by s2 out Fruit), and so on, until end of output accords with EOS, the training process of single text terminates, and continues training others later Text, until reaching preset stopping criterion for iteration.
It should be noted that can be performed the following operations after having trained a text every time, to judge whether to reach Preset stopping criterion for iteration:
Judge whether the number of iterations reaches default value, if so, determination meets the preset stopping criterion for iteration;
Or,
Judge whether the model parameter of Recognition with Recurrent Neural Network model restrains, if so, determination meets preset iteration ends Condition.
That is, judging whether current the number of iterations reaches one in advance after each text training is completed Setting value, alternatively, judging whether the model parameter of Recognition with Recurrent Neural Network model restrains, if the current the number of iterations reaches preset The model parameter of numerical value or Recognition with Recurrent Neural Network model convergence, it is determined that meet the stopping criterion for iteration of threshold value.
204, when reaching preset stopping criterion for iteration, it is up to circulation nerve net when preset stopping criterion for iteration Network model is determined as preset model.
In view of the foregoing it is apparent that in embodiment provided by the invention, can obtain first in corpus meet it is special Each text of call format requires to encode, obtains each text to the sentence in each text according to special format later The sentence coding of sentence in this, later, model of the sentence coding based on sentence in each text to Recognition with Recurrent Neural Network model Parameter is iterated update, and Recognition with Recurrent Neural Network model when by iteration ends is determined as preset model.Due to by format about Beam is encoded in the training process of model, using the text of the model prediction, in addition to keeping single sentence and single sentence Logic is coherent outer, can also meet the call format of particular text.
The generation method of text provided in an embodiment of the present invention is illustrated above, below with reference to Fig. 3 to of the invention real The generating means for applying the text of example offer are illustrated.
Referring to Fig. 3, Fig. 3 is the virtual architecture schematic diagram of the generating means of text provided in an embodiment of the present invention, this article This generating means include:
Acquiring unit 301, for obtaining target code, the target code is the text to be generated for meeting special format requirement This first sentence coding;
Processing unit 302, for the target code to be inputted preset model, to export the corresponding mesh of the target code Sequential coding is marked, the preset model is trained to obtain by object module to training data, the training data packet The sentence coding of sentence in each text for meeting the special format requirement in corpus and each text is included, it is described The sentence coding of sentence meets the special format requirement in each text;
Generation unit 303, for generating the text for meeting special format requirement according to target sequence coding.
Optionally, the object module is Recognition with Recurrent Neural Network model, described device further include:
Training unit 304, the training unit 304 are used for:
Obtain each text;
It requires to encode the sentence in each text respectively according to the special format, it is described each to obtain The sentence coding of sentence in text;
Sentence coding based on sentence in each text carries out the model parameter of the Recognition with Recurrent Neural Network model Iteration updates;
When reaching preset stopping criterion for iteration, it is up to circulation mind when the preset stopping criterion for iteration It is determined as the preset model through network model.
Optionally, the special format requires to include rhyme requirement, rhythm requirement and/or content requirement, and the training is single Member 304, encodes the sentence in each text according to the special format, respectively to obtain in each text The sentence of sentence encodes
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out the sentence in each text Rhyme coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, The research content of sentence in each text is obtained, the research content is corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm coding to the sentence in each text The rhythm coding of sentence in each text is obtained, the rhythm coding is corresponding with rhythm requirement;
Sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Sentence coding.
Optionally, described device further include:
Judging unit 305, the judging unit 305 are used for:
Judge whether the number of iterations reaches default value, if so, determination meets the preset stopping criterion for iteration;
Or,
The judging unit is also used to:
Judge whether the model parameter of the Recognition with Recurrent Neural Network model restrains, if so, determine meet it is described preset Stopping criterion for iteration.
Optionally, the acquiring unit 301 is specifically used for:
Judge whether to receive the operational order of user;
If so, obtaining the target code from the corpus according to the operational order of the user;
If it is not, the sentence randomly selected from the corpus coding is then determined as the target code.
Optionally, the generation unit 303 is specifically used for:
Determine that encoding identical target sentences with the sentence in target sequence coding in the corpus encodes;
The text for meeting special format requirement described in corresponding sentence generation is encoded based on the target sentences.
Interactive mode between each unit of the generating means of text in the present embodiment is as shown in earlier figures 1 and Fig. 2 Description in embodiment, specific details are not described herein again.
In view of the foregoing it is apparent that in embodiment provided by the invention, it can be by the way that target code be inputted default mould Type inputs the corresponding target sequence coding of the target code, which is the text to be generated for meeting special format requirement First sentence coding, generated according to target sequence coding meet the text of special format requirement later.Since format constraints being encoded To in the process of text generation, so that the text generated meets specific call format.
Above figure 3 retouches the generating means of the text in the embodiment of the present invention from the angle of modular functionality entity It states, is described in detail below from generating means of the angle of hardware handles to the text in the embodiment of the present invention, please refers to figure 4,400 one embodiment of generating means of the text in the embodiment of the present invention, comprising:
(wherein the quantity of processor 403 can be with for input unit 401, output device 402, processor 403 and memory 404 One or more, in Fig. 4 by taking a processor 403 as an example).In some embodiments of the invention, input unit 401, output Device 402, processor 403 and memory 404 can be connected by bus or other means, wherein to be connected by bus in Fig. 4 For.
Wherein, the operational order stored by calling memory 404, processor 403, for executing following steps:
Target code is obtained, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;
The target code is inputted into preset model, it is described to export the corresponding target sequence coding of the target code Preset model is trained to obtain by object module to training data, and the training data includes meeting institute in corpus The sentence coding of sentence in each text and each text of special format requirement is stated, sentence in each text Sentence coding meets the special format requirement;
The text for meeting special format requirement is generated according to target sequence coding.
By the operational order for calling memory 404 to store, processor 403, is also used to execute Fig. 1 and Fig. 2 is corresponding Formula either in embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
The embodiment of the invention also provides a kind of storage mediums, are stored thereon with program, when which is executed by processor Realize the generation method of the text.
The embodiment of the invention also provides a kind of processor, the processor is for running program, wherein described program fortune The generation method of the text is executed when row.
The embodiment of the invention also provides a kind of equipment, equipment includes processor, memory and stores on a memory simultaneously The program that can be run on a processor, processor perform the steps of when executing program
Target code is obtained, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;
The target code is inputted into preset model, it is described to export the corresponding target sequence coding of the target code Preset model is trained to obtain by object module to training data, and the training data includes meeting institute in corpus The sentence coding of sentence in each text and each text of special format requirement is stated, sentence in each text Sentence coding meets the special format requirement;
The text for meeting special format requirement is generated according to target sequence coding.
In the specific implementation process, it may be implemented when processor executes program any in the corresponding embodiment of Fig. 1 and Fig. 2 Embodiment.
Equipment herein can be server, PC, PAD, mobile phone etc..
The present invention also provides a kind of computer program products, when executing in the generating device in text, are adapted for carrying out The program of initialization there are as below methods step:
Target code is obtained, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;
The target code is inputted into preset model, it is described to export the corresponding target sequence coding of the target code Preset model is trained to obtain by object module to training data, and the training data includes meeting institute in corpus The sentence coding of sentence in each text and each text of special format requirement is stated, sentence in each text Sentence coding meets the special format requirement;
The text for meeting special format requirement is generated according to target sequence coding.
In the specific implementation process, it may be implemented in the corresponding embodiment of Fig. 1 and Fig. 2 when executing computer program product Any embodiment.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to the method for the embodiment of the present invention, equipment (system) and computer program product flow chart and/ Or block diagram describes.It should be understood that each process that can be realized by computer program instructions in flowchart and/or the block diagram and/ Or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer program instructions To general purpose computer, special purpose computer, Embedded Processor or other programmable texts generating device processor to generate One machine, so that being generated by the instruction that the processor of the generating device of computer or other programmable texts executes for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions may also be stored in be able to guide the generating device of computer or other programmable texts with In the computer-readable memory of ad hoc fashion work, so that instruction stored in the computer readable memory generation includes The manufacture of command device, the command device are realized in one box of one or more flows of the flowchart and/or block diagram Or the function of being specified in multiple boxes.
These computer program instructions can also be loaded into the generating device of computer or other programmable texts, so that Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus in computer Or the instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram The step of function of being specified in one box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer Readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of the present invention can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the present invention Form.It is deposited moreover, the present invention can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only the embodiment of the present invention, are not intended to restrict the invention.To those skilled in the art, The invention may be variously modified and varied.It is all within the spirit and principles of the present invention made by any modification, equivalent replacement, Improve etc., it should be included within scope of the presently claimed invention.

Claims (10)

1. a kind of generation method of text characterized by comprising
Target code is obtained, the target code is that the first sentence of the text to be generated for meeting special format requirement encodes;
The target code is inputted into preset model, it is described default to export the corresponding target sequence coding of the target code Model is trained to obtain by object module to training data, and the training data includes meeting the spy in corpus The sentence of sentence encodes in each text and each text of different call format, the sentence of sentence in each text Coding meets the special format requirement;
The text for meeting special format requirement is generated according to target sequence coding.
2. the method according to claim 1, wherein the object module be Recognition with Recurrent Neural Network model, it is described Target code is inputted into preset model, before exporting the corresponding target sequence coding of the target code, the method is also wrapped It includes:
Obtain each text;
It requires to encode the sentence in each text respectively according to the special format, to obtain each text The sentence of middle sentence encodes;
Sentence coding based on sentence in each text is iterated the model parameter of the Recognition with Recurrent Neural Network model It updates;
When reaching preset stopping criterion for iteration, it is up to Recognition with Recurrent Neural Network when the preset stopping criterion for iteration Model is determined as the preset model.
3. according to the method described in claim 2, it is characterized in that, the special format requires to include rhyme requirement, rhythm is wanted It asks and/or content requires, it is described to require to encode the sentence in each text respectively according to the special format, with The sentence for obtaining sentence in each text, which encodes, includes:
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out rhyme to the sentence in each text Coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, obtains The research content of sentence in each text, the research content are corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm to the sentence in each text and encodes to obtain The rhythm coding of sentence in each text, the rhythm coding are corresponding with rhythm requirement;
The sentence of sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Son coding.
4. according to the method in any one of claims 1 to 3, which is characterized in that the acquisition target code includes:
Judge whether to receive the operational order of user;
If so, obtaining the target code from the corpus according to the operational order of the user;
If it is not, the sentence randomly selected from the corpus coding is then determined as the target code.
5. according to the method in any one of claims 1 to 3, which is characterized in that described to be encoded according to the target sequence It generates and meets the text of special format requirement and include:
Determine that encoding identical target sentences with the sentence in target sequence coding in the corpus encodes;
The text for meeting special format requirement described in corresponding sentence generation is encoded based on the target sentences.
6. a kind of generating means of text characterized by comprising
Acquiring unit, for obtaining target code, the target code is the head of the text to be generated for meeting special format requirement Sentence coding;
Processing unit, for the target code to be inputted preset model, to export the corresponding target sequence of the target code Coding, the preset model are trained to obtain by object module to training data, and the training data includes corpus Meet the sentence coding of sentence in each text and each text of the special format requirement, each text in library Sentence coding of sentence meets the special format requirement in this;
Generation unit, for generating the text for meeting special format requirement according to target sequence coding.
7. device according to claim 6, which is characterized in that the object module is Recognition with Recurrent Neural Network model, described Device further include:
Training unit, the training unit are used for:
Obtain each text;
It requires to encode the sentence in each text respectively according to the special format, to obtain each text The sentence of middle sentence encodes;
Sentence coding based on sentence in each text is iterated the model parameter of the Recognition with Recurrent Neural Network model It updates;
When reaching preset stopping criterion for iteration, it is up to circulation nerve net when the preset stopping criterion for iteration Network model is determined as the preset model.
8. device according to claim 7, which is characterized in that the special format requires to include rhyme requirement, rhythm is wanted Ask and/or content requirement, the training unit, according to the special format require to the sentence in each text respectively into Row encodes, and includes: to obtain the sentence coding of sentence in each text
The simple or compound vowel of a Chinese syllable of the last character based on the sentence in each text carries out rhyme to the sentence in each text Coding, obtains the rhyme coding of sentence in each text, and the rhyme coding is corresponding with rhyme requirement;
And/or
Target word based on the sentence in each text carries out semantic coding to the sentence in each text, obtains The research content of sentence in each text, the research content are corresponding with content requirement;
And/or
Sentence length based on the sentence in each text carries out rhythm to the sentence in each text and encodes to obtain The rhythm coding of sentence in each text, the rhythm coding are corresponding with rhythm requirement;
The sentence of sentence in each text is determined according to rhyme coding, rhythm coding and/or the research content Son coding.
9. a kind of computer readable storage medium, which is characterized in that including instruction, when run on a computer, make to succeed in one's scheme The step of generation method of text described in any one of calculation machine perform claim requirement 1 to 5.
10. a kind of computer program product comprising instruction, when run on a computer, so that computer executes above-mentioned power Benefit require any one of 1 to 5 described in text generation method the step of.
CN201910409516.4A 2019-05-15 2019-05-15 A kind of generation method and relevant device of text Pending CN110134960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910409516.4A CN110134960A (en) 2019-05-15 2019-05-15 A kind of generation method and relevant device of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910409516.4A CN110134960A (en) 2019-05-15 2019-05-15 A kind of generation method and relevant device of text

Publications (1)

Publication Number Publication Date
CN110134960A true CN110134960A (en) 2019-08-16

Family

ID=67574673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910409516.4A Pending CN110134960A (en) 2019-05-15 2019-05-15 A kind of generation method and relevant device of text

Country Status (1)

Country Link
CN (1) CN110134960A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368697A (en) * 2020-02-28 2020-07-03 中国建设银行股份有限公司 Information identification method and device
CN111444695A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111581916A (en) * 2020-05-15 2020-08-25 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable medium
CN111783455A (en) * 2020-07-13 2020-10-16 网易(杭州)网络有限公司 Training method and device of text generation model and text generation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050262042A1 (en) * 2004-05-19 2005-11-24 International Business Machines Corporation Generating a dynamic content creation program
CN109002433A (en) * 2018-05-30 2018-12-14 出门问问信息科技有限公司 A kind of document creation method and device
CN109670185A (en) * 2018-12-27 2019-04-23 北京百度网讯科技有限公司 Document creation method and device based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050262042A1 (en) * 2004-05-19 2005-11-24 International Business Machines Corporation Generating a dynamic content creation program
CN109002433A (en) * 2018-05-30 2018-12-14 出门问问信息科技有限公司 A kind of document creation method and device
CN109670185A (en) * 2018-12-27 2019-04-23 北京百度网讯科技有限公司 Document creation method and device based on artificial intelligence

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368697A (en) * 2020-02-28 2020-07-03 中国建设银行股份有限公司 Information identification method and device
CN111444695A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111444695B (en) * 2020-03-25 2022-03-01 腾讯科技(深圳)有限公司 Text generation method, device and equipment based on artificial intelligence and storage medium
CN111581916A (en) * 2020-05-15 2020-08-25 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable medium
CN111581916B (en) * 2020-05-15 2022-03-01 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable medium
CN111783455A (en) * 2020-07-13 2020-10-16 网易(杭州)网络有限公司 Training method and device of text generation model and text generation method and device

Similar Documents

Publication Publication Date Title
CN110134960A (en) A kind of generation method and relevant device of text
Yi et al. Generating chinese classical poems with rnn encoder-decoder
CN105244020B (en) Prosodic hierarchy model training method, text-to-speech method and text-to-speech device
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
KR102116518B1 (en) Apparatus for answering a question based on maching reading comprehension and method for answering a question using thereof
CN110210032B (en) Text processing method and device
CN109002433B (en) Text generation method and device
CN108509411A (en) Semantic analysis and device
CN108153864A (en) Method based on neural network generation text snippet
CN111651557A (en) Automatic text generation method and device and computer readable storage medium
CN110287489A (en) Document creation method, device, storage medium and electronic equipment
CN109063164A (en) A kind of intelligent answer method based on deep learning
KR101923780B1 (en) Consistent topic text generation method and text generation apparatus performing the same
CN110851650B (en) Comment output method and device and computer storage medium
CN110795913A (en) Text encoding method and device, storage medium and terminal
CN109543165A (en) Document creation method and device based on cyclic convolution attention model
CN112464658B (en) Text abstract generation method, system, terminal and medium based on sentence fusion
CN107679225A (en) A kind of reply generation method based on keyword
CN110457661A (en) Spatial term method, apparatus, equipment and storage medium
Broad et al. Active Divergence with Generative Deep Learning--A Survey and Taxonomy
CN107506345A (en) The construction method and device of language model
Liu et al. Generating style-specific Chinese tang poetry with a simple actor-critic model
CN110532560A (en) A kind of method and calculating equipment of generation text header
CN114490926A (en) Method and device for determining similar problems, storage medium and terminal
CN115879450B (en) Gradual text generation method, system, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination