CN106227714A - A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence - Google Patents

A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence Download PDF

Info

Publication number
CN106227714A
CN106227714A CN201610556319.1A CN201610556319A CN106227714A CN 106227714 A CN106227714 A CN 106227714A CN 201610556319 A CN201610556319 A CN 201610556319A CN 106227714 A CN106227714 A CN 106227714A
Authority
CN
China
Prior art keywords
key word
word
poem
basis
language material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610556319.1A
Other languages
Chinese (zh)
Inventor
和为
王哲
伍海洋
李伟
何中军
胡晓光
刘璇
吴甜
吴华
王海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610556319.1A priority Critical patent/CN106227714A/en
Publication of CN106227714A publication Critical patent/CN106227714A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

It is an object of the invention to provide a kind of method and apparatus obtaining the key word generating poem based on artificial intelligence.The method according to the invention includes: extract one or more bases key word from poem solicited message;When a basis key word is not in poem corpus, obtain the one or more expanded keyword corresponding with this basis key word;By in the one or more expanded keyword, select to be contained at least one expanded keyword in described poem corpus, as the language material key word corresponding with this key word, to generate corresponding verse based on this language material key word.It is an advantage of the current invention that: by basis key word is extended, carry out the conversion between optimized integration key word and language material key word so that poem automatically generates mechanism and copes with continuous renewal and the change of modern languages.

Description

A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of key obtaining generation poem based on artificial intelligence The method and apparatus of word.
Background technology
Artificial intelligence (Artificial Intelligence), english abbreviation is AI.It is research, be developed for simulation, One new science of technology of theory, method, technology and the application system of the intelligence of extension and extension people.Artificial intelligence is to calculate One branch of machine science, its attempt understands the essence of intelligence, and produce a kind of new can be in the way of human intelligence be similar The intelligent machine made a response, the research in this field includes robot, language identification, image recognition, natural language processing and specially Family's system etc..
In existing poem generation technique, it is typically only capable to accept the input of key word, and long sentence input cannot be accepted, with Time, the key word accepted typically requires as the everyday words in classic poetry, and, the generation process of poem also relies primarily on ancient poetry The corpus of word.But, the natural language that people are used develops into today, has occurred in that to run into and does not comprises in classic poetry Modern vocabulary, such as, new things title or the name etc. of modern such as " chaffy dish ", " Zhou Jielun ";Further, some vocabulary are in the modern times It is provided with implication diverse with ancient times.In the case, existing poem generating mode often cannot realize respectively Plant the fusion between new term and the classic poetry rhythm, it is impossible to the natural language needed for generating poem is carried out the most appropriate recognizing Know and process.
Summary of the invention
It is an object of the invention to provide a kind of method and apparatus obtaining the key word generating poem based on artificial intelligence.
According to an aspect of the invention, it is provided a kind of side obtaining the key word generating poem based on artificial intelligence Method, wherein, said method comprising the steps of:
A extracts one or more bases key word from poem solicited message;
B, when a basis key word is not in poem corpus, obtains corresponding with this basis key word one or more Expanded keyword;
C, by the one or more expanded keyword, selects at least one being contained in described poem corpus to expand Exhibition key word, as the language material key word corresponding with this key word, to generate corresponding verse based on this language material key word.
According to an aspect of the invention, it is provided a kind of acquisition based on artificial intelligence generates taking of the key word of poem Word device, wherein, described in take word device and include:
Extraction element, for extracting one or more bases key word from poem solicited message;
First acquisition device, for when a basis key word is not in poem corpus, obtaining and this basis key word Corresponding one or more expanded keyword;
First selects device, for by the one or more expanded keyword, selecting to be contained in described poem language material At least one expanded keyword in storehouse, as the language material key word corresponding with this key word, with raw based on this language material key word Become corresponding verse.
Compared with prior art, the invention have the advantages that by basis key word is extended, carry out optimized integration Conversion between key word and language material key word such that it is able to automatically generate and both meet original poem solicited message, meet again The poetry works that the rhythm of classical poems, word etc. require.Achieve between modern languages culture and poem type and term Merge so that poem automatically generates mechanism and copes with continuous renewal and the change of modern languages.It is thus possible to it is broadly full The demand of the foot user poem for generating.
Accompanying drawing explanation
By the detailed description that non-limiting example is made made with reference to the following drawings of reading, other of the present invention Feature, purpose and advantage will become more apparent upon:
Fig. 1 illustrates the method stream that a kind of based on artificial intelligence the acquisition according to the present invention generates the key word of poem Cheng Tu;
Fig. 2 illustrates the word that takes of the key word that a kind of based on artificial intelligence the acquisition according to the present invention generates poem and fills The structural representation put;
In accompanying drawing, same or analogous reference represents same or analogous parts.
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Fig. 1 illustrates the method stream that a kind of based on artificial intelligence the acquisition according to the present invention generates the key word of poem Cheng Tu;Wherein, the method according to the invention includes step S1, step S2 and step S3.
Wherein, the method according to the invention is realized by the word device that takes being contained in computer equipment.Described calculating Machine equipment includes a kind of can automatically carrying out the electricity of numerical computations and/or information processing according to the instruction being previously set or storing Subset, its hardware includes but not limited to microprocessor, special IC (ASIC), programmable gate array (FPGA), numeral Processor (DSP), embedded device etc..Described computer equipment includes the network equipment and subscriber equipment.
Wherein, the described network equipment includes but not limited to the service that single network server, multiple webserver form Device group or the cloud being made up of a large amount of main frames or the webserver based on cloud computing (Cloud Computing), wherein, cloud computing It is the one of Distributed Calculation, the super virtual machine being made up of a group loosely-coupled computer collection.
Wherein, described subscriber equipment includes but not limited to that any one can be passed through keyboard, mouse, remote controller with user, touch The mode such as template or voice-operated device carries out the electronic product of man-machine interaction, such as, computer, panel computer, smart mobile phone, PDA or Handheld device etc..
With reference to Fig. 1, in step sl, take word device from poem solicited message, extract one or more bases key word.
Wherein, described poem solicited message includes the solicited message for generating poem.Preferably, described poem request letter Breath includes one or more bases key word.
Preferably, described poem solicited message uses the long sentential form with complex structure.
Specifically, take word device and use the natural language processing mode such as semantic analysis, participle, from described poem solicited message Middle extraction one or more basis key word.
Preferably, word device is taken to each base in poem solicited message in its reverse document frequency (IDF, inverse Document frequency) it is ranked up, with the IDF value ranking results according to each word, select required one or many Individual basis key word.
Wherein, the IDF value of some word can be obtained by following formula (1).
idf t = log | D | | D t | - - - ( 1 )
Wherein idftFor the IDF value of key word t, | D | represents that the sum of document in corpus, | Dt | expression comprise key word The number of documents of t.
It should be noted that when using different corpus, the possible difference of the IDF value of each key word, such as, for Same key word, uses the poem corpus comprising classic poetry, the webpage corpus with employing comprises all webpages, calculates gained IDF value may be different.
Those skilled in the art should determine selected corpus according to practical situation and demand, and then obtain phase The IDF value of the key word answered.Such as, for each key word in poem solicited message, can directly add up it at poem language material IDF value in storehouse;The most such as, when a certain key word found in poem solicited message is not in poem corpus, can be based on Webpage corpus carrys out computer IDF value, and adjusts IDF value with the adjustment weights between poem corpus and webpage corpus Whole, etc..Here is omitted.
Then, in step s 2, when a basis key word is not in poem corpus, takes word device and obtain and this basis One or more expanded keyword that key word is corresponding.
Wherein, described poem corpus includes the corpus being made up of poem.Such as, Tang poetry, the such poems of the Song Dynasty, Yuan songs etc. are comprised many Plant the corpus of material.
Specifically, when a basis key word is not in poem corpus, takes word device and use multiple webpages to described base Plinth key word is extended, to obtain the one or more expanded keyword corresponding with this basis key word.
Preferably, take word device and obtain the one or more info webs corresponding with this basis key word;Then, word dress is taken Put from the one or more info web, extract the extension pass the most corresponding with the one or more info web respectively Keyword.
Wherein, described expanded keyword is different from described basis key word.
Such as, take word device and carry out web page interrogation based on a basis key word, corresponding with this basis key word to obtain One or more results web page, and based on this basis key word, extract from these one or more results web page and close with this basis Keyword is close or the word that is associated is as expanded keyword.
Wherein, take word device to be determined and basis key word phase by natural language processing modes such as semantic analysis Like or the expanded keyword that is associated.
It is highly preferred that take word device based on described basis key word, scan in web database, multiple to obtain The info web corresponding with this basis key word;And quality information of based on each info web, from multiple described info webs Middle selection quality information meets one or more info webs of predetermined quality condition.
Wherein, those skilled in the art are it is to be appreciated that can pass through the access number of webpage, outer chain number, user's time of staying Etc. parameter determine the quality information of webpage, here is omitted.
Then, in step s3, take word device by the one or more expanded keyword, select to be contained in described poem At least one expanded keyword in word corpus, as the language material key word corresponding with this key word, to close based on this language material Keyword generates corresponding verse.
Specifically, take word device and judge whether these one or more expanded keyword are contained in poem corpus respectively In, when expanded keyword is contained in poem corpus, using this expanded keyword as language material key word.
An example according to the present invention, takes word device and obtains basis key word " Liu Dehua " in step sl, and determine This basis key word is also not included in poem corpus.Then, in step s 2, take word device to search for and obtain and this basis One or more webpages of key word " Liu Dehua ", with from these one or more webpages, obtain corresponding with this basis key word Multiple expanded keyword " king ", " singer ", " performer " etc., then, in step s3, take word device and select to be contained in poem Expanded keyword " king " in corpus, as the language material key word corresponding with basis key word " Liu Dehua ", with based on this Generate corresponding verse.
Preferably, when basis key word corresponding one or more expanded keyword when, take word device and obtain institute respectively State the value information of one or more expanded keyword, and, based on the respective weights of the one or more expanded keyword Information therefrom selects at least one expanded keyword, as the language material key word corresponding with described basis key word.
Wherein, described value information is for indicating the importance of expanded keyword, and such as, expanded keyword is at web data IDF value in storehouse, the most such as, when expanded keyword is contained in poem corpus, this expanded keyword is at this poem language material IDF value etc. in storehouse.
It is highly preferred that take word device from multiple expanded keyword corresponding to key word of basis, according to each extension key Word IDF value in web database, selects the expanded keyword (such as x) of predetermined quantity;Then, judge that this makes a reservation for respectively The expanded keyword of quantity is the most all in poem corpus, and to expanded keyword (the such as y being contained in poem corpus Individual, wherein y≤x) obtain its IDF value in poem corpus respectively, and select at least one extension to close based on this IDF value Keyword, as the language material key word corresponding with this basis key word.
A preferred version according to the present invention, the method according to the invention also includes step S4 (not shown).
In step s 4, when a basis key word information is contained in corpus, word device is taken by this basis key word As language material key word.
According to the another preferred version of the present invention, the method for the present invention also includes step S5 (not shown) and step S6 (figure Do not show).
In step s 5, take word device and obtain poem type to be generated.
Wherein, poem type includes the structural style of poem.Such as, five speech ancient poetry, seven-character "old style" verse, five-character quatrain, five speeches Regulated verse and all kinds of word, the name of tune etc..
Specifically, take word device and according to the input operation of user, or according to default type, poem to be generated can be determined Song type.
Then, in step s 6, take word device based on described poem type, determine the sum of required language material key word N。
Specifically, take the word device type according to poem to be generated, determine the sentence quantity that this type is corresponding, and according to Sentence quantity determines required expectation key word number.
Preferably, total as required language material key word of the sentence quantity in the poem type that word device will be obtained is taken Number N.
Such as, take word device and determine that user selects seven-word poem in step s 5, the most in step s 6, take word device and determine Required language material key word sum is 4;The most such as, take word device and receive the selection information selecting poem with five characters in one line of user, The most in step s 6, take word device and determine that required language material key word sum is 8;The most such as, the type that word device receives is taken For word " read slave spoil ", the most in step s 6, take word device according to clause corresponding to this word, determine the language material key word of correspondence Sum is 8.
According to the preferred embodiment of this programme, described method also includes step S7 (not shown).
In the step s 7, when the quantity of the multiple bases key word extracted from described poem solicited message is more than N number of, Take word device value information based on each basis key word, from the key word of the plurality of basis, select N number of basis key word.
Wherein, word device value information based on each basis key word is taken, from the key word of the plurality of basis described in The mode selecting N number of basis key word is believed based on the respective weights of the one or more expanded keyword with the aforementioned word device that takes Breath therefrom selects the mode of at least one expanded keyword similar or close, repeats no more.
According to the another preferred embodiment of this programme, described method also includes step S8 (not shown).
In step s 8, when fixed language material key word less than N number of time, to described fixed language material key word based on Described poem corpus is extended, to obtain the language material key word of remaining number from corpus data storehouse.
According to the another preferred embodiment of this programme, when fixed language material key word is less than time N number of, take word device to institute State fixed language material key word to be extended based on described poem corpus, to obtain remaining number from corpus data storehouse Language material key word.
Specifically, word device is taken by general based on each word is added up obtained language model by poem corpus Rate, obtains the association key word corresponding with the one or more language material key words to determine, and is closed by the up/down literary composition obtained Keyword is as language material key word.
Wherein, take word device and can obtain each word correspondence respectively directly according to the probabilistic language model that each word is corresponding Up/down literary composition key word;Or, take word device and can give probabilistic language model in real time to a certain language material pass in poem corpus Keyword carries out adding up and obtain its probabilistic language model, to obtain the up/down literary composition key word corresponding to this language material key word.
Such as, when the language material key word obtained is " sunset clouds " word, word device is taken according to the poem in poem corpus The key word of word carries out adding up the probabilistic language model obtained, it may be determined that corresponding the most frequently used hereafter key word is " lonely duck ", and using this hereafter key word as language material key word.
The most such as, when the language material key word obtained is " the Changjiang river ", takes word device and comprise in " the Changjiang river " in poem corpus The poem sentence of one word carries out adding up as follows:
(1)
Boundless/the wood that falls/sough/under,
Not to the utmost/the Changjiang river/billowing/come.
(2)
Lonely sail/distant view/blue sky/to the greatest extent,
Only see/the Changjiang river/horizon/stream.
Then from above-mentioned two sections of verses, it may be determined that the key word above in " the Changjiang river " includes " not to the utmost ", " only seeing ";Hereafter close Keyword includes " billowing ", " horizon " etc..
Then, taking word device can therefrom select corresponding up/down literary composition key word as language material based on probabilistic language model Key word.
Need to obtain K language material key word preferably for predetermined, and currently obtain the situation of m language material key word With Wi, (m < K), represents that i-th descriptor, the then process obtaining remaining K-m language material key word can pass through following formula (2) Represent:
W n + 1 : K = arg max W m + 1 : K P ( W m + 1 : K | W 1 : m ) - - - ( 2 )
Wherein, Wm+1:KRepresent that m+1 is to the sequence of k-th key word, P (Wm+1:K|W1:m) represent given W1:m(the 1st Sequence to m-th word) in the case of, Wm+1:KThe conditional probability occurred.
According to Markov Hypothesis (Markov Assumption), each word probability of occurrence is the most relevant with front n-1 word (n is a hyper parameter here, typically takes 5), here by the method for n gram language model to P (Wm+1:K|W1:m) solve, so obtaining Below equation (3):
P ( W m + 1 : K | W 1 : m ) = Π j = m + 1 K P ( W j | W j - n + 1 , ... , W j - 1 ) - - - ( 3 )
Here Wj-n+1..., Wj-1Represent word WjN-1 word above, P (Wj|Wj-n+1..., Wj-1) represent given In the case of front n-1 word, generate WjConditional probability.
Wherein, probability P (Wj|Wj-n+1..., Wj-1) equation below (4) can be used, by the method for Maximum-likelihood estimation Carry out:
P ( W j | W j - n + 1 , ... , W j - 1 ) = C ( W j - n + 1 , ... , W j ) C ( W j - n + 1 , ... , W j - 1 ) - - - ( 4 )
Wherein, the C (W in formula (4)j-n+1..., Wj) represent frequency statistics, i.e. word strings Wj-n+1..., WjAt language material Occurrence number in storehouse, similarly, C (Wj-n+1..., Wj-1) represent word strings Wj-n+1..., Wj-1Appearance in corpus Number of times.
According to the another preferred version of the present invention, the method according to the invention also includes by taking word device based on being received Voice messaging determines described poem solicited message;And/or, take word device and the described verse generated is converted to voice letter Breath.
Wherein, text corresponding for verse or voice should be changed with demand by those skilled in the art according to practical situation For corresponding form, here is omitted.
The method according to the invention, by being extended basis key word, carrys out optimized integration key word and language material is crucial Conversion between word such that it is able to automatically generate and both meet original poem solicited message, meet again the rhythm of classical poems, use The poetry works that word etc. require.Achieve the fusion between modern languages culture and poem type and term so that poem is automatic Generting machanism copes with continuous renewal and the change of modern languages.It is thus possible to broadly meet user for generation The demand of poem.
Fig. 2 illustrates the word that takes of the key word that a kind of based on artificial intelligence the acquisition according to the present invention generates poem and fills The structural representation put.Wherein, include that extraction element the 1, first acquisition device 2 and first selects according to the word device that takes of the present invention Device 3.
With reference to Fig. 2, extraction element 1 extracts one or more bases key word from poem solicited message.
Wherein, described poem solicited message includes the solicited message for generating poem.Preferably, described poem request letter Breath includes one or more bases key word.
Preferably, described poem solicited message uses the long sentential form with complex structure.
Specifically, extraction element 1 uses the natural language processing mode such as semantic analysis, participle, from described poem request letter Breath extracts one or more bases key word.
Preferably, extraction element 1 to each base in poem solicited message in its reverse document frequency (IDF, Inverse document frequency) it is ranked up, with the IDF value ranking results according to each word, select required One or more bases key word.
Wherein, the IDF value of some word can be obtained by following formula (1).
idf t = log | D | | D t | - - - ( 1 )
Wherein idftFor the IDF value of key word t, | D | represents that the sum of document in corpus, | Dt | expression comprise key word The number of documents of t.
It should be noted that when using different corpus, the possible difference of the IDF value of each key word, such as, for Same key word, uses the poem corpus comprising classic poetry, the webpage corpus with employing comprises all webpages, calculates gained IDF value may be different.
Those skilled in the art should determine selected corpus according to practical situation and demand, and then obtain phase The IDF value of the key word answered.Such as, for each key word in poem solicited message, can directly add up it at poem language material IDF value in storehouse;The most such as, when a certain key word found in poem solicited message is not in poem corpus, can be based on Webpage corpus carrys out computer IDF value, and adjusts IDF value with the adjustment weights between poem corpus and webpage corpus Whole, etc..Here is omitted.
Then, when a basis key word is not in poem corpus, the first acquisition device 2 obtains and this basis key word Corresponding one or more expanded keyword.
Wherein, described poem corpus includes the corpus being made up of poem.Such as, Tang poetry, the such poems of the Song Dynasty, Yuan songs etc. are comprised many Plant the corpus of material.
Specifically, when a basis key word is not in poem corpus, the first acquisition device 2 uses multiple webpage to institute State basis key word to be extended, to obtain the one or more expanded keyword corresponding with this basis key word.
Preferably, the sub-acquisition device (not shown) being contained in the first acquisition device 2 China obtains and this basis key word Corresponding one or more info webs;Then, the sub-extraction element (not shown) of the first acquisition device 2 China it is contained in respectively The expanded keyword the most corresponding with the one or more info web is extracted from the one or more info web.
Wherein, described expanded keyword is different from described basis key word.
Such as, sub-acquisition device carries out web page interrogation based on a basis key word, corresponding with this basis key word to obtain One or more results web page, and sub-extraction element is based on this basis key word, from these one or more results web page Extract word that is close with this basis key word or that be associated as expanded keyword.
Wherein, sub-extraction element can be determined and basis key word by natural language processing modes such as semantic analysis Expanded keyword that is similar or that be associated.
It is highly preferred that the searcher (not shown) being contained in sub-acquisition device is based on described basis key word, at net Page data scans in storehouse, to obtain multiple info web corresponding with this basis key word;Further, sub-acquisition dress it is contained in The second selection device (not shown) quality information based on each info web in putting, selects from multiple described info webs Quality information meets one or more info webs of predetermined quality condition.
Wherein, those skilled in the art are it is to be appreciated that can pass through the access number of webpage, outer chain number, user's time of staying Etc. parameter determine the quality information of webpage, here is omitted.
Then, first selects device 3 by the one or more expanded keyword, selects to be contained in described poem language material At least one expanded keyword in storehouse, as the language material key word corresponding with this key word, with raw based on this language material key word Become corresponding verse.
Specifically, first device 3 is selected to judge whether these one or more expanded keyword are contained in poem language respectively In material storehouse, when expanded keyword is contained in poem corpus, using this expanded keyword as language material key word.
An example according to the present invention, extraction element 1 obtains basis key word " Liu Dehua ", and determines that this basis is crucial Word is also not included in poem corpus.Then, the first acquisition device 2 is searched for and is obtained and this basis key word " Liu Dehua " One or more webpages, with from these one or more webpages, obtain the multiple expanded keyword corresponding with this basis key word " king ", " singer ", " performer " etc., then, first selects device 3 to select to be contained in the expanded keyword in poem corpus " king ", as the language material key word corresponding with basis key word " Liu Dehua ", to generate corresponding verse based on this.
Preferably, when basis key word corresponding one or more expanded keyword when, it is contained in the first selection device 3 In the second acquisition device (not shown) obtain the value information of the one or more expanded keyword respectively, and, comprise The first son in device 3 is selected to select device (not shown) based on the respective power of the one or more expanded keyword in first Value information therefrom selects at least one expanded keyword, as the language material key word corresponding with described basis key word.
Wherein, described value information is for indicating the importance of expanded keyword, and such as, expanded keyword is at web data IDF value in storehouse, the most such as, when expanded keyword is contained in poem corpus, this expanded keyword is at this poem language material IDF value etc. in storehouse.
It is highly preferred that first selects device 3 from multiple expanded keyword that basis key word is corresponding, extend according to each Key word IDF value in web database, selects the expanded keyword (such as x) of predetermined quantity;Then, judging respectively should The expanded keyword of predetermined quantity is the most all in poem corpus, and to the expanded keyword being contained in poem corpus (such as y, wherein y≤x) obtains its IDF value in poem corpus respectively, and selects at least one based on this IDF value Expanded keyword, as the language material key word corresponding with this basis key word.
A preferred version according to the present invention, when a basis key word information is contained in corpus, takes word device Using this basis key word as language material key word.
According to the another preferred version of the present invention, the present invention take word device also include the 3rd acquisition device (not shown) and Determine device (not shown).
3rd acquisition device obtains poem type to be generated.
Wherein, poem type includes the structural style of poem.Such as, five speech ancient poetry, seven-character "old style" verse, five-character quatrain, five speeches Regulated verse and all kinds of word, the name of tune etc..
Specifically, the 3rd acquisition device can according to the input operation of user, or according to default type, determine to be generated Poem type.
It is then determined device is based on described poem type, determine the total N of required language material key word.
Specifically, it is determined that device is according to the type of poem to be generated, determine the sentence quantity that this type is corresponding, and according to Sentence quantity determines required expectation key word number.
Preferably, total as required language material key word of the sentence quantity in the poem type that device will be obtained is determined Number N.
Such as, the 3rd acquisition device obtains user and selects seven-word poem, it is determined that device determines required language material key word Sum is 4;The most such as, the 3rd acquisition device receives the selection information selecting poem with five characters in one line of user, then the 3rd obtain dress Put and determine that required language material key word sum is 8;The most such as, the type that the 3rd acquisition device receives is that word " is read slave to spoil ", Then the 3rd acquisition device is according to clause corresponding to this word, determines that the language material key word sum of correspondence is 8.
According to the preferred embodiment of this programme, when the number of the multiple bases key word extracted from described poem solicited message Amount, more than time N number of, take word device value information based on each basis key word, selects N from the key word of the plurality of basis Individual basis key word.
Wherein, word device value information based on each basis key word is taken, from the key word of the plurality of basis described in The mode selecting N number of basis key word is believed based on the respective weights of the one or more expanded keyword with the aforementioned word device that takes Breath therefrom selects the mode of at least one expanded keyword similar or close, repeats no more.
According to the another preferred embodiment of this programme, when fixed language material key word is less than time N number of, take word device to institute State fixed language material key word to be extended based on described poem corpus, to obtain remaining number from corpus data storehouse Language material key word.
According to the another preferred embodiment of this programme, when fixed language material key word is less than time N number of, take word device to institute State fixed language material key word to be extended based on described poem corpus, to obtain remaining number from corpus data storehouse Language material key word.
Specifically, word device is taken by general based on each word is added up obtained language model by poem corpus Rate, obtains the association key word corresponding with the one or more language material key words to determine, and is closed by the up/down literary composition obtained Keyword is as language material key word.
Wherein, take word device and can obtain each word correspondence respectively directly according to the probabilistic language model that each word is corresponding Up/down literary composition key word;Or, take word device and can give probabilistic language model in real time to a certain language material pass in poem corpus Keyword carries out adding up and obtain its probabilistic language model, to obtain the up/down literary composition key word corresponding to this language material key word.
Such as, when the language material key word obtained is " sunset clouds " word, word device is taken according to the poem in poem corpus The key word of word carries out adding up the probabilistic language model obtained, it may be determined that corresponding the most frequently used hereafter key word is " lonely duck ", and using this hereafter key word as language material key word.
The most such as, when the language material key word obtained is " the Changjiang river ", takes word device and comprise in " the Changjiang river " in poem corpus The poem sentence of one word carries out adding up as follows:
(1)
Boundless/the wood that falls/sough/under,
Not to the utmost/the Changjiang river/billowing/come.
(2)
Lonely sail/distant view/blue sky/to the greatest extent,
Only see/the Changjiang river/horizon/stream.
Then from above-mentioned two sections of verses, it may be determined that the key word above in " the Changjiang river " includes " not to the utmost ", " only seeing ";Hereafter close Keyword includes " billowing ", " horizon " etc..
Then, taking word device can therefrom select corresponding up/down literary composition key word as language material based on probabilistic language model Key word.
Need to obtain K language material key word preferably for predetermined, and currently obtain the situation of m language material key word With Wi, (m < K), represents that i-th descriptor, the then process obtaining remaining K-m language material key word can pass through following formula (2) Represent:
W m + 1 : K = arg max W m + 1 : K P ( W m + 1 : K | W 1 : m ) - - - ( 2 )
Wherein, Wm+1:KRepresent that m+1 is to the sequence of k-th key word, P (Wm+1:K|W1:m) represent given W1:m(the 1st Sequence to m-th word) in the case of, Wm+1:KThe conditional probability occurred.
According to Markov Hypothesis (Markov Assumption), each word probability of occurrence is the most relevant with front n-1 word (n is a hyper parameter here, typically takes 5), here by the method for n gram language model to P (Wm+1:K|W1:m) solve, so obtaining Below equation (3):
P ( W m + 1 : K | W 1 : m ) = Π j = m + 1 K P ( W j | W j - n + 1 , ... , W j - 1 ) - - - ( 3 )
Here Wj-n+1..., Wj-1Represent word WjN-1 word above, P (Wj|Wj-n+1..., Wj-1) represent given In the case of front n-1 word, generate WjConditional probability.
Wherein, probability P (Wj|Wj-n+1..., Wj-1) equation below (4) can be used, by the method for Maximum-likelihood estimation Carry out:
P ( W j | W j - n + 1 , ... , W j - 1 ) = C ( W j - n + 1 , ... , W j ) C ( W j - n + 1 , ... , W j - 1 ) - - - ( 4 )
Wherein, the C (W in formula (4)j-n+1..., Wj) represent frequency statistics, i.e. word strings Wj-n+1..., WjAt language material Occurrence number in storehouse, similarly, C (Wj-n+1..., Wj-1) represent word strings Wj-n+1..., Wj-1Appearance in corpus Number of times.
According to the another preferred version of the present invention, determined described poem by taking word device based on the voice messaging received Solicited message;And/or, by taking word device, the described verse generated is converted to voice messaging.
Wherein, text corresponding for verse or voice should be changed with demand by those skilled in the art according to practical situation For corresponding form, here is omitted.
According to the solution of the present invention, by basis key word is extended, carrys out optimized integration key word and language material is crucial Conversion between word such that it is able to automatically generate and both meet original poem solicited message, meet again the rhythm of classical poems, use The poetry works that word etc. require.Achieve the fusion between modern languages culture and poem type and term so that poem is automatic Generting machanism copes with continuous renewal and the change of modern languages.It is thus possible to broadly meet user for generation The demand of poem.
The software program of the present invention can perform to realize steps described above or function by processor.Similarly, originally The software program (including the data structure being correlated with) of invention can be stored in computer readable recording medium storing program for performing, and such as, RAM deposits Reservoir, magnetically or optically driver or floppy disc and similar devices.It addition, some steps of the present invention or function can employ hardware to reality Existing, such as, perform the circuit of each function or step as coordinating with processor.
It addition, the part of the present invention can be applied to computer program, such as computer program instructions, when its quilt When computer performs, by the operation of this computer, can call or provide the method according to the invention and/or technical scheme. And call the programmed instruction of the method for the present invention, it is possibly stored in fixing or movably in record medium, and/or passes through Data stream in broadcast or other signal bearing medias and be transmitted, and/or be stored in and run according to described programmed instruction In the working storage of computer equipment.Here, include a device according to one embodiment of present invention, this device includes using In the memorizer of storage computer program instructions with for performing the processor of programmed instruction, wherein, when this computer program refers to When order is performed by this processor, trigger this plant running method based on aforementioned multiple embodiments according to the present invention and/or skill Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of the spirit or essential attributes of the present invention, it is possible to realize the present invention in other specific forms.Therefore, no matter From the point of view of which point, all should regard embodiment as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all by fall in the implication of equivalency and scope of claim Change is included in the present invention.Should not be considered as limiting involved claim by any reference in claim.This Outward, it is clear that " including ", a word was not excluded for other unit or step, and odd number is not excluded for plural number.In system claims, statement is multiple Unit or device can also be realized by software or hardware by a unit or device.The first, the second word such as grade is used for table Show title, and be not offered as any specific order.

Claims (20)

1. the method obtaining the key word generating poem based on artificial intelligence, wherein, said method comprising the steps of:
A extracts one or more bases key word from poem solicited message;
B, when a basis key word is not in poem corpus, obtains the one or more extensions corresponding with this basis key word Key word;
C, by the one or more expanded keyword, selects at least one extension being contained in described poem corpus to close Keyword, as the language material key word corresponding with this key word, to generate corresponding verse based on this language material key word.
Method the most according to claim 1, wherein, described step b further includes steps of
B1 obtains the one or more info webs corresponding with this basis key word;
B2 extracts the expansion the most corresponding with the one or more info web respectively from the one or more info web Exhibition key word, wherein, described expanded keyword is different from described basis key word.
Method the most according to claim 2, wherein, described step b1 farther includes:
-based on described basis key word, scan in web database, multiple corresponding with this basis key word to obtain Info web;
-quality information based on each info web, selects quality information to meet predetermined quality from multiple described info webs One or more info webs of condition.
The most according to the method in any one of claims 1 to 3, wherein, described step c further includes steps of
-obtain the value information of the one or more expanded keyword respectively;
-therefrom select at least one expanded keyword based on the respective value information of the one or more expanded keyword, As the language material key word corresponding with described basis key word.
Method the most according to any one of claim 1 to 4, wherein, described method is further comprising the steps of:
-when a basis key word information is contained in corpus, using this basis key word as language material key word.
Method the most according to any one of claim 1 to 5, wherein, described method is further comprising the steps of:
-obtain poem type to be generated;
-based on described poem type, determine the total N of required language material key word.
Method the most according to claim 6, wherein, described method is further comprising the steps of:
-when the quantity of the multiple bases key word extracted from described poem solicited message is more than time N number of, close based on each basis The value information of keyword, selects N number of basis key word from the key word of the plurality of basis.
8. according to the method described in claim 6 or 7, wherein, described method is further comprising the steps of:
-when fixed language material key word is less than time N number of, to described fixed language material key word based on described poem corpus It is extended, to obtain the language material key word of remaining number from corpus data storehouse.
Method the most according to any one of claim 1 to 8, wherein, described method is further comprising the steps of:
-determine described poem solicited message based on the voice messaging received.
10. according to right, he requires the method according to any one of 1 to 9, and wherein, described method is further comprising the steps of:
-the described verse generated is converted to voice messaging.
11. 1 kinds based on artificial intelligence obtain generate poems key words take word device, wherein, described in take word device bag Include:
Extraction element, for extracting one or more bases key word from poem solicited message;
First acquisition device, for when a basis key word is not in poem corpus, obtaining corresponding with this basis key word One or more expanded keyword;
First selects device, for by the one or more expanded keyword, selecting to be contained in described poem corpus At least one expanded keyword, as the language material key word corresponding with this key word, to generate phase based on this language material key word The verse answered.
The 12. word devices that take according to claim 11, wherein, described first acquisition device farther includes:
Sub-acquisition device, for obtaining the one or more info webs corresponding with this basis key word;
Sub-extraction element, for extracting and the one or more info web respectively from the one or more info web The most corresponding expanded keyword, wherein, described expanded keyword is different from described basis key word.
The 13. word devices that take according to claim 12, wherein, described sub-acquisition device farther includes:
Searcher, for based on described basis key word, scanning for, to obtain multiple and this basis in web database The info web that key word is corresponding;
Second selects device, for quality information based on each info web, selects quality from multiple described info webs Information meets one or more info webs of predetermined quality condition.
14. according to the method according to any one of claim 11 to 13, and wherein, described first selects device to farther include:
Second acquisition device, for obtaining the value information of the one or more expanded keyword respectively;
First son select device, for based on the respective value information of the one or more expanded keyword therefrom select to A few expanded keyword, as the language material key word corresponding with described basis key word.
15. according to taking word device according to any one of claim 11 to 14, wherein, described in take word device and be additionally operable to:
-when a basis key word information is contained in corpus, using this basis key word as language material key word.
16. according to taking word device according to any one of claim 11 to 15, wherein, described in take word device and also include:
3rd acquisition device, for obtaining poem type to be generated;
Determine device, for based on described poem type, determine the total N of required language material key word.
The 17. word devices that take according to claim 16, wherein, described in take word device and be additionally operable to:
-when the quantity of the multiple bases key word extracted from described poem solicited message is more than time N number of, close based on each basis The value information of keyword, selects N number of basis key word from the key word of the plurality of basis.
18. according to taking word device described in claim 16 or 17, wherein, described in take word device and be additionally operable to:
-when fixed language material key word is less than time N number of, to described fixed language material key word based on described poem corpus It is extended, to obtain the language material key word of remaining number from corpus data storehouse.
19. according to taking word device according to any one of claim 11 to 18, wherein, described in take word device and be additionally operable to:
-determine described poem solicited message based on the voice messaging received.
20. according to right, he requires to take word device according to any one of 11 to 19, wherein, described in take word device and be additionally operable to:
-the described verse generated is converted to voice messaging.
CN201610556319.1A 2016-07-14 2016-07-14 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence Pending CN106227714A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610556319.1A CN106227714A (en) 2016-07-14 2016-07-14 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610556319.1A CN106227714A (en) 2016-07-14 2016-07-14 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN106227714A true CN106227714A (en) 2016-12-14

Family

ID=57520060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610556319.1A Pending CN106227714A (en) 2016-07-14 2016-07-14 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN106227714A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095752A (en) * 2016-06-07 2016-11-09 北京百度网讯科技有限公司 A kind of method and apparatus for automatically generating poem
CN107944003A (en) * 2017-12-06 2018-04-20 国云科技股份有限公司 A kind of classic poetry is collected and data analysing method
CN108415893A (en) * 2018-03-15 2018-08-17 平安科技(深圳)有限公司 Poem automatic generation method, device, computer equipment and storage medium
CN109213777A (en) * 2017-06-29 2019-01-15 杭州九阳小家电有限公司 A kind of voice-based recipe processing method and system
CN110414001A (en) * 2019-07-18 2019-11-05 腾讯科技(深圳)有限公司 Sentence generation method and device, storage medium and electronic device
WO2019242001A1 (en) * 2018-06-22 2019-12-26 Microsoft Technology Licensing, Llc Method, computing device and system for generating content
CN110738061A (en) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 Ancient poetry generation method, device and equipment and storage medium
CN110852086A (en) * 2019-09-18 2020-02-28 平安科技(深圳)有限公司 Artificial intelligence based ancient poetry generating method, device, equipment and storage medium
CN111814488A (en) * 2020-07-22 2020-10-23 网易(杭州)网络有限公司 Poetry generation method and device, electronic equipment and readable storage medium
CN111950255A (en) * 2019-05-17 2020-11-17 腾讯数码(天津)有限公司 Poetry generation method, device and equipment and storage medium
CN113010717A (en) * 2021-04-26 2021-06-22 中国人民解放军国防科技大学 Image verse description generation method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1112541A1 (en) * 1998-09-09 2001-07-04 Invention Machine Corporation Document semantic analysis/selection with knowledge creativity capability
CN102014199A (en) * 2010-09-16 2011-04-13 宇龙计算机通信科技(深圳)有限公司 Information display method and terminal
CN102385596A (en) * 2010-09-03 2012-03-21 腾讯科技(深圳)有限公司 Verse searching method and device
CN103106282A (en) * 2013-02-27 2013-05-15 王义东 Method for search and display of webpage
CN103530291A (en) * 2012-07-03 2014-01-22 同程网络科技股份有限公司 Keyword release word developing method and device thereof suitable for search engine
CN103744956A (en) * 2014-01-06 2014-04-23 同济大学 Diversified expansion method of keyword

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1112541A1 (en) * 1998-09-09 2001-07-04 Invention Machine Corporation Document semantic analysis/selection with knowledge creativity capability
CN102385596A (en) * 2010-09-03 2012-03-21 腾讯科技(深圳)有限公司 Verse searching method and device
CN102014199A (en) * 2010-09-16 2011-04-13 宇龙计算机通信科技(深圳)有限公司 Information display method and terminal
CN103530291A (en) * 2012-07-03 2014-01-22 同程网络科技股份有限公司 Keyword release word developing method and device thereof suitable for search engine
CN103106282A (en) * 2013-02-27 2013-05-15 王义东 Method for search and display of webpage
CN103744956A (en) * 2014-01-06 2014-04-23 同济大学 Diversified expansion method of keyword

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔希亮,张宝林主编: "《第二届汉语中介语语料库建设与应用国际学术讨论会论文选集》", 31 December 2013 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095752A (en) * 2016-06-07 2016-11-09 北京百度网讯科技有限公司 A kind of method and apparatus for automatically generating poem
CN106095752B (en) * 2016-06-07 2019-06-25 北京百度网讯科技有限公司 A kind of method and apparatus for automatically generating poem
CN109213777A (en) * 2017-06-29 2019-01-15 杭州九阳小家电有限公司 A kind of voice-based recipe processing method and system
CN107944003A (en) * 2017-12-06 2018-04-20 国云科技股份有限公司 A kind of classic poetry is collected and data analysing method
CN108415893A (en) * 2018-03-15 2018-08-17 平安科技(深圳)有限公司 Poem automatic generation method, device, computer equipment and storage medium
WO2019174186A1 (en) * 2018-03-15 2019-09-19 平安科技(深圳)有限公司 Automatic poem generation method and apparatus, and computer device and storage medium
CN108415893B (en) * 2018-03-15 2019-09-20 平安科技(深圳)有限公司 Poem automatic generation method, device, computer equipment and storage medium
WO2019242001A1 (en) * 2018-06-22 2019-12-26 Microsoft Technology Licensing, Llc Method, computing device and system for generating content
CN111950255A (en) * 2019-05-17 2020-11-17 腾讯数码(天津)有限公司 Poetry generation method, device and equipment and storage medium
CN111950255B (en) * 2019-05-17 2023-05-30 腾讯数码(天津)有限公司 Poem generation method, device, equipment and storage medium
CN110414001A (en) * 2019-07-18 2019-11-05 腾讯科技(深圳)有限公司 Sentence generation method and device, storage medium and electronic device
CN110414001B (en) * 2019-07-18 2023-09-26 腾讯科技(深圳)有限公司 Sentence generation method and device, storage medium and electronic device
CN110852086A (en) * 2019-09-18 2020-02-28 平安科技(深圳)有限公司 Artificial intelligence based ancient poetry generating method, device, equipment and storage medium
CN110852086B (en) * 2019-09-18 2022-02-08 平安科技(深圳)有限公司 Artificial intelligence based ancient poetry generating method, device, equipment and storage medium
CN110738061A (en) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 Ancient poetry generation method, device and equipment and storage medium
CN110738061B (en) * 2019-10-17 2024-05-28 北京搜狐互联网信息服务有限公司 Ancient poetry generating method, device, equipment and storage medium
CN111814488A (en) * 2020-07-22 2020-10-23 网易(杭州)网络有限公司 Poetry generation method and device, electronic equipment and readable storage medium
CN111814488B (en) * 2020-07-22 2024-06-07 网易(杭州)网络有限公司 Poem generation method and device, electronic equipment and readable storage medium
CN113010717A (en) * 2021-04-26 2021-06-22 中国人民解放军国防科技大学 Image verse description generation method, device and equipment

Similar Documents

Publication Publication Date Title
CN106227714A (en) A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
CN104376406B (en) A kind of enterprise innovation resource management and analysis method based on big data
CN104899273B (en) A kind of Web Personalization method based on topic and relative entropy
CN105069102B (en) Information push method and apparatus
CN103514299B (en) Information search method and device
CN101470732B (en) Auxiliary word stock generation method and apparatus
KR20210116379A (en) Method, apparatus for text generation, device and storage medium
CN105069143B (en) Extract the method and device of keyword in document
CN106951438A (en) A kind of event extraction system and method towards open field
CN102567509B (en) Method and system for instant messaging with visual messaging assistance
CN102955848A (en) Semantic-based three-dimensional model retrieval system and method
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
CN103870000A (en) Method and device for sorting candidate items generated by input method
CN111488467A (en) Construction method and device of geographical knowledge graph, storage medium and computer equipment
CN109063147A (en) Online course forum content recommendation method and system based on text similarity
CN104978314A (en) Media content recommendation method and device
CN105740310B (en) A kind of automatic answer method of abstracting and system in question answering system
CN102844755A (en) Method of extracting named entity
JP2018509664A (en) Model generation method, word weighting method, apparatus, device, and computer storage medium
CN106095912A (en) For the method and apparatus generating expanding query word
CN110348919A (en) Item recommendation method, device and computer readable storage medium
CN115018549A (en) Method for generating advertisement file, device, equipment, medium and product thereof
KR20160112248A (en) Latent keyparase generation method and apparatus
CN109960721A (en) Multiple Compression based on source contents constructs content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161214

RJ01 Rejection of invention patent application after publication