WO2019242001A1 - Method, computing device and system for generating content - Google Patents

Method, computing device and system for generating content Download PDF

Info

Publication number
WO2019242001A1
WO2019242001A1 PCT/CN2018/092398 CN2018092398W WO2019242001A1 WO 2019242001 A1 WO2019242001 A1 WO 2019242001A1 CN 2018092398 W CN2018092398 W CN 2018092398W WO 2019242001 A1 WO2019242001 A1 WO 2019242001A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
word set
poem
words
segment
Prior art date
Application number
PCT/CN2018/092398
Other languages
French (fr)
Inventor
Ruihua Song
Yuanchun XU
Jing Yuan
Jianlong FU
Guang ZHOU
Wenfeng CHENG
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Priority to PCT/CN2018/092398 priority Critical patent/WO2019242001A1/en
Publication of WO2019242001A1 publication Critical patent/WO2019242001A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Definitions

  • the method includes receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; and generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  • RNN Recurrent Neural Network
  • Fig. 1 shows an environment in an exemplary embodiment that is operable to employ the techniques described in the present disclosure
  • Fig. 2 is a flow chart of a method for generating a modern poem according to an embodiment of the present disclosure
  • Fig. 3 is flow chart of a process of obtaining a first word set including at least a word describing affective feature of the original material according to an embodiment of the present disclosure
  • Fig. 4 shows an example of generating a poem based on a picture according to an embodiment of the present disclosure
  • Fig. 5 illustrates a schematic diagram of generating content in English according to an embodiment of the present disclosure
  • Fig. 6 illustrates a schematic diagram of generating content in Chinese according to an embodiment of the present disclosure
  • Fig. 7 illustrates a forward-version hierarchical poetry model including sentence level and poetry level according to an embodiment of the present disclosure
  • Fig. 8 illustrates a backward-version hierarchical poetry model including sentence level and poetry level according to an embodiment of the present disclosure
  • Fig. 9 shows another example of generating a poem based on a picture according to an embodiment of the present disclosure
  • Fig. 10 shows a scenario of generating a poem according to a picture and several days’chatting log according to an embodiment of the present disclosure
  • Fig. 11 shows a scenario of using a word map to generate a poem according to an embodiment of the present disclosure
  • Fig. 12 is a block diagram depicting an exemplary computing device according to an embodiment of the present disclosure.
  • Fig. 13 is a block diagram depicting an exemplary system for generating content according to an embodiment of the present disclosure.
  • values, procedures, or apparatus are referred to as “lowest” , “best” , “minimum” or the like. It will be appreciated that such descriptions are intended to indicate that a selection among many used functional alternatives may be made, and such selections need not be better, smaller, or otherwise preferable to other selections.
  • an exemplary environment is first described that is operable to employ the techniques described herein.
  • Example illustrations of the various embodiments are then described, which may be employed in the exemplary environment, as well as in other environments. Accordingly, the example environment is not limited to performing the described embodiments and the described embodiments are not limited to implementation in the example environment.
  • Fig. 1 shows an environment 100 in an example embodiment that is operable to employ the techniques described in this document.
  • the environment 100 for operating content generation includes a terminal device 101 that may communicate with a computing device, via networks.
  • the terminal device 101 may transmit a picture to the computing device.
  • the computing device After receiving this picture transmitted by the terminal device 101, the computing device obtains a word set and generate content by using the word set. Finally, the generated content is feed back to the terminal device 101.
  • the computing device may include one or more processor, and one or more network interfaces which enable communications between the computing device and other networked devices such as a terminal device 101.
  • the computing device may be a server or a cloud server.
  • the terminal device may be included in the system for generating content in the present disclosure.
  • the terminal device 101 may belong to a variety of categories or classes of devices, such as traditional client-type devices, desktop computer-type devices, mobile-type devices, special purpose-type devices, embedded-type devices, and/or wearable-type devices.
  • the terminal device 101 of the various categories or classes may have one or more processor operably connected to machine-readable media.
  • Executable instructions stored on machine-readable media can include, for example, an operating system and/or modules, programs, or applications that are loadable and executable by processors.
  • the terminal device 101 also includes image storage for storing personal images. The images may be captured using a device-associated camera, although in some embodiments, the images could be captured using a device other than the camera included in the terminal device 101. In some embodiments, the terminal device may not include the camera but images may be uploaded or otherwise transferred to the terminal device 101, e.g., from a device equipped with a camera.
  • the terminal device 101 may also include one or more network interfaces 1013 to enable communications between the terminal device 101 and the computing devices. In one example, the terminal device 101 may send the image or other original material to the computing device for generating content based on the transmitted original material. The terminal device 101 may then display the generated content.
  • the terminal device 101 may include a processor 1011, a memory 1012, a network interface 1013, a display 1014, a chatting application 1015, an I/O interface 1016.
  • the terminal device 101 may further comprise a speaker and a microphone for providing voice input and output.
  • the network may include public networks such as the Internet, private networks such as an institutional and/or personal intranet, or some combination of private and public networks.
  • the network may also include any type of wired and/or wireless network, including but not limited to local area networks (LANs) , wide area networks (WANs) , satellite networks, cable networks, Wi-Fi networks, WiMAX networks, mobile communications networks (e.g., 3G, 4G, and so forth) or any combination thereof.
  • the network may utilize communications protocols, including packet-based and/or datagram-based protocols such as internet protocol (IP) , transmission control protocol (TCP) , user datagram protocol (UDP) , or other types of protocols.
  • IP internet protocol
  • TCP transmission control protocol
  • UDP user datagram protocol
  • the network may also include a number of devices that facilitate network communications and/or form a hardware basis for the networks, such as switches, routers, gateways, access points, firewalls, base stations, repeaters, backbone devices, and the like.
  • the network may further include devices that enable connection to a wireless network, such as a wireless access point (WAP) .
  • WAP wireless access point
  • the network may support connectivity through WAPs that send and receive data over various electromagnetic frequencies (e.g., radio frequencies) , including WAPs that support Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (e.g., 802.11g, 802.11n, and so forth) , and other standards.
  • IEEE Institute of Electrical and Electronics Engineers
  • the content generated according to the embodiment of the present disclosure may include, but is not limited to poetry, novel, essay and play. Since the process of generating the above various content is similar or the same, the method of generating content is described below by taking poetry as an example.
  • Fig. 2 is a flow chart of a method for generating a poem according to an embodiment of the present disclosure.
  • a poem may include at least one segment, and the term “segment” used herein may refer to, for example, a sentence, one or several words in the poem.
  • original material is received at step S101.
  • the original material may include a picture, an audio, a video, context or any combination thereof provided during a conversation with a chatbot.
  • the original material may be in a form of a message which may refer to a text including at least one word.
  • word used herein throughout the specification refers to not only a word but also a Chinese character in the case the content is generated. Since the content may be generated based on various kind of original material, the process of the generation of the content may be made simpler and funnier.
  • both an image and some words may be received to generate a poem during a conversation.
  • the user or a chatbot may talk to another user or talk to another chatbot via the terminal device.
  • the user may input a message including some words to express that he/she feels sad, and then take a picture of a scene in the rain.
  • the method according to the embodiment of the present disclosure will generate a poem based on both the picture and the user’s sadness (i.e., the message expressing sad) .
  • the user may take a picture while listening to music, and the method of the present disclosure will generate a poem based on the picture and the music.
  • one or more pictures may be provided as the original material to generate a poem. Therefore, more convenience and funny may be provided when generating a poem by using the method according to embodiments of the present disclosure.
  • different styles of poem may also be provided for the user to select.
  • the user may select a style of poem such as Crescent School Poem and Misty Poem before or after he/she input a picture (for example, take a picture or select an existing picture) .
  • the method of the present disclosure will generate a poem based on the picture and the selected style.
  • a first word set including at least a word describing affective feature of the original material is obtained.
  • the word describing affective feature may be an adjective, such as the word “busy” for describing the noun word “city” or the word describing affective feature may be a noun, such as the word “smile” for expressing a happy mood.
  • the affective feature of an image refers to various subjective experiences that the image evokes human’s feeling, impression even affection. Since the word describing affective feature may be obtained, the generated poem may bring more impression and touching to the reader. How to obtain the word describing affective feature of the original material will be described in detail hereinafter.
  • a poem is generated by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) .
  • RNN Recurrent Neural Network
  • each segment of the poem employs a recurrent neural network language model (RNNLM) to predict a word next to the word selected for the segment.
  • RNNLM recurrent neural network language model
  • Each word of the segment of the poem may predicted sequentially by the previous word sequence:
  • w i is the i th word and w 1: i-1 represents the preceding word sequence.
  • RNNLM Due to the directivity of RNNLM, one may only generate forward from the existing word. To allow the word to appear in any position in a sentence, a simple idea is training a reverse version of RNNLMs (which train the corpus by a reverse ordering) , and generating backward from the existing word:
  • LM forward and LM backward are the forward and backward RNNLM.
  • Algorithm 1 The process of generating j th segment l j with j th word k j in the poem is described in Algorithm 1.
  • a forward RNN and a backward RNN are used to predict words next to a word in the first word set until two line breaks representing the beginning and the end of a segment respectively are predicted.
  • the above two RNN are used to predict words next to a word in the first word set according to the word in the first word set and the previous segment.
  • the first segment of the poem may be generated based on a first word
  • the second segment of the poem may be generated based on the generated first segment and a second word in the word set. Therefore, the correlation of the generated segments among the poem may be improved.
  • the poem may include more than one paragraphs each including a plurality of the segments, and a segment of at least one of the paragraphs may be predicted from a word in the first word set used in a previous paragraph.
  • the poem includes two paragraphs, each paragraph consisting of four segments, six words may be obtained in which four words are used for generating the first paragraph.
  • the method may reuse two words that have been used in the first paragraph and the remaining two words to generate the second paragraph.
  • the segments in different paragraphs may comprise the same words and an effect of echoing before and after may be achieved. Therefore, the correlation of the segments among the poem may be further improved.
  • an inverted index may be established based on the segments of the generated poem and the first word set and the RNN may be trained based on the inverted index. Specifically, in the inverted index, the word for generating one segment of the poem, other words in this segment of the poem and this segment of the poem is indexed to establish a correspondence from the words to the sentence. Then, the inverted index may be feed back to the RNN for learning. In the further generation, if the word in this segment which is not the word used for generating this segment is obtained, the same segment is indexed. Therefore, the diversity of the generation of the poem may be improved.
  • the method may further include actions of performing a check on the generated words or sentences based on an N-GRAM and a skip N-GRAM model to measure whether a word is appropriate and whether two words in a sentence are semantic consistent.
  • the method may further include a step of check to determine the completeness of the generated poem.
  • the trained RNN may predict that the word after the adjective “big” should be the words “mountain” or “trees” .
  • the GRAM model may detect that the word “sea” in this sentence is not appropriate.
  • an initial word set (also referred to as a second word set) may be extracted by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material.
  • the two CNNs may be trained by using a great number of images, music and/or videos.
  • the two CNNs share the same network architecture but with different parameters. Specifically, one CNN learns to describe objects by the output of noun words, and the other CNN learns to understand the affective feature by the output of adjective words.
  • the two CNNs may be pre-trained on ImageNet and fine-tuned on noun and adjective categories, respectively.
  • the extracted deep convolutional representations are denoted as W c *I, where W c denotes the overall parameters of one CNN, *denotes a set of operations of convolution, pooling and activation, and I denotes the input image.
  • W c denotes the overall parameters of one CNN
  • I denotes the input image.
  • P probability distribution
  • f ( ⁇ ) represents fully-connected layers to map convolutional features to a feature vector that could be matched with the category entries and includes a softmax layer to further transform the feature vector to probabilities.
  • the probability over noun and adjective categories are denoted as p n (C
  • GoogleNet since GoogleNet may produce the first-class performance on ImageNet competition, GoogleNet is selected as the basic network structure. Then, 272 nouns and 181 adjectives as the categories for noun and adjective CNN training, since these categories are adequate to describe common objects and affective feature from images.
  • the training time for each CNN takes about 50 hours on a Tesla K40 GPU, with the top-one classification accuracy of 92.3%and 85.4%on the initial word set, respectively.
  • inappropriate words may be removed from the initial word set to generate an intermediate word set (also referred to as a third word set) .
  • an intermediate word set also referred to as a third word set
  • words with confidence score lower than a predefined threshold from the initial word set may be removed. Because of the removing, the words for generating the poem may be more accurate.
  • the first word set may be obtained based on the intermediate word set.
  • a word with high co-occurrence with a word in the intermediate word set may be selected from a corpus based on at least one literary genre including modern poem, and the first word set may be obtained by adding the selected words into the intermediate word set.
  • a frequent word in the corpus may be also selected.
  • a word that not only occurs frequently in the corpus and but also has high co-occurrence with a word from the third word set in the corpus may be selected so as to generate the first word set by adding the selected words into the third word set.
  • a word map including a plurality of words may be generated based on the corpus.
  • one of a direct or indirect connection may exist between any two of the plurality of words.
  • a word in the third word set that is rarely used may be replaced with a word having connection with the rare word in the word map.
  • a word that has connection with a word in the third word set may be selected from the word map and the first word set may be obtained by adding the selected words into the intermediate word set.
  • the first word set comprising more accurate words may be obtained. Therefore, a long accurate poem may be generated easily.
  • the initial word set or the intermediate word set themselves may be directly used as the first word set.
  • Fig. 4 is an example of generating a poem based on a picture according to an embodiment of the present disclosure.
  • a process of generating a poem from the picture in Fig. 4 will be described in detail as follows.
  • Fig. 4 illustrates a scenario where a user is talking to a chatbot while the user is on the way to work and complaining about the serious traffic in the morning. Therefore, the user may wish the chatbot to write a poem based on the picture shot by the user about the traffic.
  • the terminal device may transmit the picture to the computing device.
  • the computing device may extract an initial word set by applying, on the picture, two parallel CNNs, wherein one CNN is for extracting the word describing affective feature and the other CNN is for extracting a word describing an object from the picture. Therefore, the initial word set thus obtained may be ⁇ city, street, traffic, road, tiny car, busy, scare, broken ⁇ .
  • the words for generating a poem may be identified by the CNNs which are trained by using ImageNet. Wherein, the words “city” , “street” , “traffic” , “tiny car” are the nouns and the words “busy” , “scare” and “broken” belongs to the words describing affective feature.
  • the initial word set may comprise words not appropriate for this picture.
  • the word “broken” in the initial word set is obviously not related to this traffic scenario. Therefore, to remove the irrelevant words, it is proposed to calculate the confidence scores about the above eight words in the initial word set ⁇ city, street, traffic, road, tiny car, busy, scare, broken ⁇ and remove the words with low confidence scores. As a result, only words with high confidence scores will be used in a next step.
  • the words with confidence scores less than a predefined value, for example 0.5 may be removed. Table 1 shows the words and their corresponding confidence scores.
  • the third word set (i.e. the intermediate word set) ⁇ city, busy ⁇ may be generated.
  • words are sampled with the word distribution of the trained corpus. More frequently a word occurs in the corpus, the greater the chance the word will be selected.
  • the nouns with highest frequency of occurrence in our corpus are life, time, and place. Applying these words may enhance both the variation and imagination of the generated poem without deviating from the theme too much.
  • the corpus may be a corpus trained based on a great number of literary genres, such as the modern poetry, fiction and etc.
  • Another implementation is to consider words with high co-occurrence with the words of the initial word set.
  • An embodiment of the present disclosure samples the words with the distribution of co-occurrence frequency with the words of the initial word set. The more often a word co-occurs with the selected words of the initial word set, the greater the chance it will be selected to as the word for generation. Taking the picture in Fig. 4 as an example, words with highest co-occurrence with the word “city” are “place” , “child” , “heart” and “land” may be selected as the words for generation. Unlike the high frequent words, these high co-occurred words usually have more relevance with the words obtained directly from the picture. Therefore, the result may respect to more specific topics of the poems.
  • Table 2 shows three approaches of obtaining a first word set and corresponding average score, word irrelevant rate and segment irrelevant rate.
  • the without_expand represents an approach of generating content by using appropriate word from the intermediate word set
  • the expand_freq represents an approach of generating content using frequent word
  • the expand_cona represents an approach of generating content by using words with high confidence score.
  • Table 2 shows the approaches of obtaining a first word set and corresponding average score, word irrelevant rate and segment irrelevant rate
  • “Frequent word” and “High co-occurred word” may be combined to generate the first word set. For example, a word that is not only frequent in the corpus and but also with high co-occurrence with a word in the third word set from the corpus, is “place” . Therefore, the word “place” , which may be more relevant to the picture, may be included in the first word set.
  • a first word set of ⁇ city, busy, place, smile ⁇ may be obtained.
  • a poem may be generated based on the first word set and the RNN that is trained by using at least one literary genre including modern poetry.
  • the generation of the poem may include generating a first segment of the poem based on one word in the first set through a forward RNN and a backward RNN and a next segment based on another word in the first word set and the previous generated segments.
  • Fig. 5 illustrates a schematic diagram of generating content in English according to an embodiment of the present disclosure.
  • the first word set ⁇ city, busy, place, smile ⁇ which is generated as above will be taken as an example.
  • the process of generating the first segment of the poem is as follows.
  • the word in the first word set for generating a first segment may be “City”
  • a trained forward RNN model predicts that the next word following the word “city” with a high probability is “flow” .
  • the word “city” is necessarily located at the beginning of the segment. Therefore, it is necessary to use a backward RNN model to predict the location of the word “city” in this segment.
  • the backward RNN model may predict there is a high probability that a line break is in front of the word “city” , it is determined that the word “city” should be located at the beginning of this segment. Then the forward RNN model keeps predicting that “slowly” , “behind” and “him” are after the word “flow” . At last, the forward RNN model predicts there is another line break after the word “him” . Since another line break is predicted, this segment of the poem is completed. Therefore, it is determined that the segment of the poem is generated. Therefore, the segment of the poem is generated as “the city flows slowly behind him” . Other segment of the poem may be generated in a similar way as described hereinabove. Finally, a poem including four segments is generated:
  • the word “busy” and “smile” are at the end of the respective segment of the poem and the word “place” is in the middle of a segment of the poem. Therefore, through the forward RNN model and the backward RNN model, the word for generating a segment of the modern poem may be at any position of this segment of the poem, unlike the method for generating acrostic poem through machine in which the word must be located at the beginning of each segment of the acrostic poem.
  • Fig. 6 illustrates a schematic diagram of generating content in Chinese according to an embodiment of the present disclosure.
  • a trained forward RNN model predicts the next word following the word “ ⁇ ” with a high probability is “ ⁇ ” .
  • the backward RNN model predicts that there is a line break at the beginning of the word “ ⁇ ” in a high probability, therefore, it is determined that the word “ ⁇ ” should be at the beginning of the first segment of the poem.
  • the forward RNN model keeps predicting words “ ⁇ ” , “ ⁇ ” , “ ⁇ ” , “ ⁇ ” and “ ⁇ ” in order until it predicts another line break which represent that the first segment is completed. Therefore, the first segment of the poem is generated as “ ⁇ ” . Finally, the four segments of the Chinese poem are generated as:
  • the fluency of segments of the poem may be controlled with the RNNLM model and a recursive strategy.
  • maintaining consistency between segments of the poem may be necessary. Since a segment may be generated in two directions, using the state of RNNLM to pass the information is no longer available.
  • An embodiment of the present disclosure may extend the input gate of the RNNLM model to two parts, one is the originally previous word input, and another is the previous segment’s information.
  • an embodiment of the present disclosure predicts the content vector of the next segment of the poem by all the previous generated segments. For generating j th segment l j in the poem:
  • Fig. 7 illustrates a hierarchical poetry model including sentence level and poem level according to an embodiment of the present disclosure.
  • w 0 represent the start of the next segment of the poem(i.e, a line break)
  • w 1 , ...w i+1 represents the words for generating the next segment of the poem.
  • l 0 represent the start of the poem (i.e. a line of null)
  • l 1 , ...l i represents the previous generated segment of the poem.
  • Fig. 7 only illustrates the forward-version hierarchical poetry model, while the backward-version hierarchical poetry model may be modified by reversing the dotted line. Since the principle of the forward version and backward version of the hierarchical poetry models are the same, the backward-version hierarchical poetry model in Fig. 8 will not be repeated here.
  • This hierarchical poem model affects the generation of the next segment of the poem by using information about the previous segment s of poem.
  • the RNN encodes a segment of the poem into a vector through a sentence decoder.
  • the generation of the next segment of the poem should consider not only the word for the next segment but also the generated first segment.
  • the RNN will automatically generate a state serving as the encoding of the previous segment of the poem and then the state including the information about the previous sentence is input into the generation of the next segment of the poem.
  • the process of generating the first and second segments “the city flows slowly behind” and “my life is busy” will be taken as an example.
  • the first segment “the city flows slowly behind” is generated according to the forward RNN, the backward RNN and the word city in the first word set.
  • the generated segment “the city flows slowly behind” is feed back to the poem level LSTM as l 1 .
  • l 1 is encoded as l′ 1 and input to the sentence level LSTM to serve as a predicted content to generate the next segment which should include the word busy.
  • the next segment “my life is busy” is generated in order and then “my life is busy” is response to poem level LSTM for the future generation.
  • the above generated poem may be taken as an example.
  • l 1 represent the generated first segment of the Chinese poem “ ⁇ ” and l j represents the generated second segment of the Chinese poem “ ⁇ ”
  • l j+1 represents the generated third segment of the poem “ ⁇ ”
  • w i+1 represents the obtained word “ ⁇ ” for the last segment of the poem.
  • the RNN model may predict the w 1 is “ ⁇ ” and w 2 is “ ⁇ ” , therefore, based on the previous three segment s of the poem and the word “ ⁇ ”, the last segment of the poem may be generated as “ ⁇ ” and then input back to the RNN model as an further input to generate the possible next segment of the poem. In this way, the consistency between segments of the poem may be improved.
  • an inverted index may be established according to respective segment of the generated poem and the first word set. Taking the above first segment of the generated poem “the city flows slowly behind him” as an example. After the segment “the city flows slowly behind him” is generated based on the word “city” , an inverted index in which “city” , “flow” , “slowly” , “behind” and “him” respectively correspond to the segment “the city flows slowly behind him” may be established, then the inverted index is fed back to the trained RNN for learning. Therefore, in the future, if the word is “flow” other than the word “city” , a same sentence “the city flows slowly behind him” may still be generated. Therefore, the diversity of the generated poem may be improved.
  • Fig. 9 is another scenario of generating a poem based on a picture according to an embodiment of the present disclosure.
  • the generated poem includes two paragraphs each including four segments, and two segments of the second paragraph comprises words used in the first paragraph.
  • a user who is travelling in New York is talking to a chatbot and requests the chatbot to write a poem according to a picture about the Time Square.
  • the poem is generated as follows.
  • the poem is generated according to the word set ⁇ city, wonderful, person, fire, wind, slightest ⁇ .
  • the first paragraph may be generated by using the first four words “city” , “wonderful” , “person” and “fire” .
  • the second paragraph may be generated by using the words “wind” and “slightest” and reusing the words “city” and “wonderful” . By reusing the word “city” and “wonderful” which have been used in the first paragraph, consistencies among segments of the poem may be improved.
  • the obtained word set is not limited to the word set extracted from the image.
  • the word “wonderful” may be selected from the conversation to be used as a word for writing the poem.
  • the conversation between the user and the chatbot may use the voice message or text message.
  • the user inputs a voice message “what a wonderful day” by talking to the speaker of the terminal device, then the computing device obtains and identifies the voice message and extract the word “wonderful” as a word in the first word set to write the poem.
  • the conversation includes, but is not limited to a conversation between a user and a chatbot, a conversation between two users or between two chatbots.
  • scenario of the present disclosure is not limited to the conversations described above.
  • the scenarios of the present disclosure may include a scenario in which an application installed on the terminal device receives a picture and generate content according to the present disclosure.
  • Fig. 10 illustrates a scenario of generating a poem according to a picture in combination with several days’chatting log according to the embodiment of the present disclosure.
  • the user requested the chatbot to add the strolling on Sunday to the reminder.
  • the user uploads a picture during the strolling and requests the chatbot to write a poem according to the picture.
  • the word “empty” may be extracted from the chatting log between the user and the chatbot on Friday. Therefore, based on the word “empty” on Friday and words which extracted from the picture shot by the user on Sunday, the poem is generated as follows.
  • Fig. 11 illustrates a scenario of using a word map to generate a poem according to an embodiment of the present disclosure.
  • the user takes a picture about the sunset and request the chatbot to generate a poem including two paragraphs each including 4 segments according to the picture about the sunset.
  • the generated word set may be ⁇ cloud, lady, heart, sky ⁇ .
  • four words in this word set is not sufficient to generate a complete poem.
  • the word “lady” in this word set is already a rare word which is usually used at the beginning of the last century. Therefore, to make up for the lack of the words in the word set and to replace the rare word in the word set, the embodiment of the present disclosure proposes to establish a word map.
  • the word “lady” is a rare word which is not appropriate in the modern poem and more appropriate word “ woman” or “girl” which is more popular and more appropriate for the modern poem may be found through the word map, the word “lady” may be replaced with the word “girl” . In this way, the diversity of the word set may be improved. Accordingly, since the word map may find the more appropriate word for the semantic environment, the generated poem may match the image more easily and touch the heart of the reader. In addition, through the word map, the words “beauty” and “cute” describing affective feature are also extended and selected for generating the poem. Therefore, through the method for generating content according to the present disclosure in combination with the word map, the poem is generated as follows.
  • the cloud is an unruly shock
  • Fig. 12 is a block diagram depicting an exemplary computing device 102 according to an embodiment of the present disclosure.
  • the computing devices may comprises one or more processors 1021 and a memory 1022 storing machine-executable instructions that when executed cause the one or more processors to perform actions comprising receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; generating content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , and wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  • RNN Recurrent Neural Network
  • the original material may comprise at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
  • the action of obtaining a first word set comprises extracting a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing sentiment and the other is for extracting a word describing an object from the original material and obtaining the first word set based on the second word set.
  • CNN convolutional neural networks
  • the action of obtaining the first word set based on the second word set comprises: removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus based on at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and generating the first word set by combining the selected words and the third word set.
  • the corpus is based on at least another literary genre than modern poem and the method further comprises: generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connections exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
  • the corpus is based on at least another literary genre than modern poem and the method further comprises: generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connections exits between any two of the plurality of words; and generating the first word set by combining the third word set with a word which has a connection with a word in the third word set according to the word map.
  • the computing device further comprises: establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
  • the content includes more than one paragraph each including a plurality of segments and a segment of a paragraph comprises a word used in a previous paragraph.
  • Fig. 13 is a block diagram depicting an exemplary system 400 for generating content according to the embodiment of the present disclosure may comprises a computing device 401 and a terminal device 402.
  • the computing device 401 may communicate with the terminal device 402 via a network 403.
  • the computing device 401 comprising a memory and one or more processor coupled to the memory; the a terminal device 402 configured to transmit original material to the computing device; wherein the memory stores machine-executable instructions that when executed, cause the one or more processors to: receive the original material from the computing device; obtain a first word set including at least a word describing sentiment implied by the original material; generate content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem; and transmit the generated content to the terminal device.
  • RNN Recurrent Neural Network
  • the original material comprises at least one of a picture, an audio, a video, and information from a conversation with a chatbot.
  • the obtaining a first word set comprises: obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; obtaining the first word set based on the second word set.
  • CNN convolutional neural networks
  • the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  • the present disclosure proposes a method for generating content including at least receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; and generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  • RNN Recurrent Neural Network
  • the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
  • the obtaining a first word set comprises obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and obtaining the first word set based on the second word set.
  • CNN convolutional neural networks
  • the obtaining the first word set based on the second word set comprises removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set: and obtaining the first word set by adding the selected words into the third word set.
  • the corpus is trained by at least a literary genre besides modern poem and the method further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
  • the method further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least another literary genre besides modern poem; removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; and obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
  • the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  • the present disclosure proposes a computing device, comprising one or more processors and a memory storing machine-executable instructions that when executed by the one or more processors, cause the one or more processors to perform actions comprising: receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; obtaining a first word set including at least a word describing affective feature of the original material; generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  • RNN Recurrent Neural Network
  • the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
  • the action of obtaining a first word set comprises obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and obtaining the first word set based on the second word set.
  • CNN convolutional neural networks
  • the action of obtaining the first word set based on the second word set comprises removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and obtaining the first word set by adding the selected words into the third word set.
  • the corpus is trained by at least a literary genre besides modern poem and the actions further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
  • the actions further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least another literary genre besides modern poem; removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
  • the actions further comprise establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
  • the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  • the present disclosure proposes a system for generating content, comprising a computing device comprising a memory and one or more processor coupled to the memory, a terminal device configured to transmit original material to the computing device; wherein the memory stores machine-executable instructions that when executed, cause the one or more processors to: receive the original material from the computing device; obtain a first word set including at least a word describing sentiment implied by the original material; generate content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem; transmit the generated content to the terminal device.
  • RNN Recurrent Neural Network
  • the original material comprises at least one of a picture, an audio, a video, and information from a conversation with a chatbot.
  • the obtaining a first word set comprises: obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; obtaining the first word set based on the second word set.
  • CNN convolutional neural networks
  • the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  • the method for generating content according to embodiments of the present disclosure will be compared with two existing methods for generating content.
  • the first existing method is Image2caption, which generating English sentences from a given image. Taking the picture in Fig. 10 as an example, the content generated by using Image2caption is:
  • the second existing method is an application designed by CTRIP.
  • a traditional poem may be generated. Still taking the picture in Fig. 10 as an example, the traditional poem generated by using the application of CTRIP from this picture is:
  • 22 assessors from variety career fields are chosen, including: 1) 8 female users and 14 male users, 2) 13 users with bachelor’s degree and 1 user with master or higher degree, 3) 11 users prefer traditional poetry, 10 users prefer modern poetry and 1 user prefers neither.
  • the method for generating content gain higher relevance to images by leveraging successful word extraction and expansion and is easier to stimulates interesting ideas and generates emotional resonance.
  • Machine programs incorporating various features described herein may be encoded and stored on various machine readable storage media; suitable media include magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk) , flash memory, and other non-transitory media.
  • Machine readable media encoded with the program code may be packaged with a compatible electronic device, or the program code may be provided separately from electronic devices (e.g., via Internet download or as a separately packaged machine-readable storage medium) .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

Embodiments of the present disclosure provide a method for generating content, a computing and a system. The method may receive original material; obtain a first word set including at least a word describing affective feature of the original material; and generate the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN), wherein the RNN has been pre-trained by using at least one literary genre including modern poem. The content generated by the method according to the present disclosure may bring more impression and touching, have more space and better consistence.

Description

METHOD, COMPUTING DEVICE AND SYSTEM FOR GENERATING CONTENT Background
With the development of Artificial Intelligence (AI) technology, it has been widely applied to various fields such as media and literature. For example, on March 3, 2016, Bradley Hayes, a postdoctoral fellow at the CS &AI Lab of MIT, used deep learning to develop a Twitter robot that mimics US presidential candidate Donald Trump, called DeepDrumpf. Furthermore, on March 22, 2016, Japan’s Kyodo News Agency reported that the novel “Robotics Writing Novels” created by AI was included in the first review of the 3rd Star New Literature Award. This award is named after Shin Sung-yi, a science fiction writer known as “the father of Japanese miniature novels. ”
Summary
This summary is provided to introduce content generation from original material that will be further described below in the Detailed Description. This summary is not intended to identify key or essential features of the claimed subject matter, nor it is intended to be used as an aid in determining the scope of the claimed subject matter.
Aspects of the present disclosure discussed herein include a method, a computing device and a system for generating content. The method includes receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; and generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
Additional aspects and advantages of the present disclosure will be appreciated and become apparent from the descriptions below, or will be well learned from the practices of the present disclosure.
The Description of Drawings
To describe the technical solutions according to the embodiments of the present disclosure more clearly, the drawings to be used in the embodiments will be briefly described below. The following drawings just show some of the embodiments of the present disclosure and thus they should not be considered as any limitation to the scope, and those skilled in the art may also obtain other related drawings according to these drawings without paying any creative effort.
Fig. 1 shows an environment in an exemplary embodiment that is operable to employ the techniques described in the present disclosure;
Fig. 2 is a flow chart of a method for generating a modern poem according to an embodiment of the present disclosure;
Fig. 3 is flow chart of a process of obtaining a first word set including at least a word describing affective feature of the original material according to an embodiment of the present disclosure;
Fig. 4 shows an example of generating a poem based on a picture according to an embodiment of the present disclosure;
Fig. 5 illustrates a schematic diagram of generating content in English according to an embodiment of the present disclosure;
Fig. 6 illustrates a schematic diagram of generating content in Chinese according to an embodiment of the present disclosure;
Fig. 7 illustrates a forward-version hierarchical poetry model including sentence level and poetry level according to an embodiment of the present disclosure;
Fig. 8 illustrates a backward-version hierarchical poetry model including sentence level and poetry level according to an embodiment of the present disclosure;
Fig. 9 shows another example of generating a poem based on a picture according to an embodiment of the present disclosure;
Fig. 10 shows a scenario of generating a poem according to a picture and several days’chatting log according to an embodiment of the present disclosure
Fig. 11 shows a scenario of using a word map to generate a poem according to an embodiment of the present disclosure;
Fig. 12 is a block diagram depicting an exemplary computing device according to an embodiment of the present disclosure; and
Fig. 13 is a block diagram depicting an exemplary system for generating content according to an embodiment of the present disclosure.
Detailed Description
As used in this application and in the claims, the singular forms “a” , “an” , and “the” include the plural forms as well as the singular unless the context clearly dictates otherwise. Additionally, the word "including" has the same broad meaning as the word "comprising" . Further, the term “coupled” does not exclude the presence of intermediate elements between the coupled items.
The methods described herein should not be construed as limiting in any way. Instead, the present disclosure is directed toward all novel and non-obvious features and aspects of the various disclosed embodiments, alone and in various combinations and sub-combinations with one another. The disclosed methods are not limited to any specific aspect or feature or combinations thereof, nor do the disclosed methods require that any one or more specific advantages be present or problems be solved. Any theories of operation are to facilitate explanation, but the disclosed systems, methods, and apparatus are not limited to such theories of operation.
Although the operations of some of the disclosed methods are described in a particular sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed methods may be used in conjunction with other methods. Additionally, the description sometimes uses terms like "produce" and "provide"  to describe the disclosed methods. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.
In some examples, values, procedures, or apparatus are referred to as “lowest” , “best” , “minimum” or the like. It will be appreciated that such descriptions are intended to indicate that a selection among many used functional alternatives may be made, and such selections need not be better, smaller, or otherwise preferable to other selections.
In the following discussion, an exemplary environment is first described that is operable to employ the techniques described herein. Example illustrations of the various embodiments are then described, which may be employed in the exemplary environment, as well as in other environments. Accordingly, the example environment is not limited to performing the described embodiments and the described embodiments are not limited to implementation in the example environment.
Fig. 1 shows an environment 100 in an example embodiment that is operable to employ the techniques described in this document. In some example embodiments, the environment 100 for operating content generation includes a terminal device 101 that may communicate with a computing device, via networks. In Fig. 1, the terminal device 101 may transmit a picture to the computing device. After receiving this picture transmitted by the terminal device 101, the computing device obtains a word set and generate content by using the word set. Finally, the generated content is feed back to the terminal device 101.
In various examples, the computing device may include one or more processor, and one or more network interfaces which enable communications between the computing device and other networked devices such as a terminal device 101. The computing device may be a server or a cloud server. The terminal device may be included in the system for generating content in the present disclosure. The terminal device 101 may belong to a variety of categories or classes of devices, such as traditional client-type devices, desktop computer-type devices, mobile-type devices, special purpose-type devices, embedded-type devices, and/or wearable-type devices. The terminal device 101 of the various categories or classes may have one or more processor operably connected to machine-readable media. Executable instructions stored on machine-readable media can include, for example, an operating system and/or modules, programs, or applications that are loadable and executable by processors. The terminal device 101 also includes image storage for storing personal images. The images may be captured using a device-associated camera, although in some embodiments, the images could be captured using a device other than the camera included in the terminal device 101. In some embodiments, the terminal device may not include the camera but images may be uploaded or otherwise transferred to the terminal device 101, e.g., from a device equipped with a camera. The terminal device 101 may also include one or more network interfaces 1013 to enable communications between the terminal device 101 and the computing devices. In one example, the terminal device 101 may send the image or other original material to the computing device for generating content based on the transmitted original material. The terminal device 101 may then display the generated content.
In an embodiment of the present disclosure, the terminal device 101 may include a processor 1011, a memory 1012, a network interface 1013, a display 1014, a chatting application 1015, an I/O interface 1016.
In an embodiment of the present disclosure, the terminal device 101 may further  comprise a speaker and a microphone for providing voice input and output.
In an embodiment of the present disclosure, the network may include public networks such as the Internet, private networks such as an institutional and/or personal intranet, or some combination of private and public networks. The network may also include any type of wired and/or wireless network, including but not limited to local area networks (LANs) , wide area networks (WANs) , satellite networks, cable networks, Wi-Fi networks, WiMAX networks, mobile communications networks (e.g., 3G, 4G, and so forth) or any combination thereof. The network may utilize communications protocols, including packet-based and/or datagram-based protocols such as internet protocol (IP) , transmission control protocol (TCP) , user datagram protocol (UDP) , or other types of protocols. Moreover, the network may also include a number of devices that facilitate network communications and/or form a hardware basis for the networks, such as switches, routers, gateways, access points, firewalls, base stations, repeaters, backbone devices, and the like.
In an embodiment of the present disclosure, the network may further include devices that enable connection to a wireless network, such as a wireless access point (WAP) . The network may support connectivity through WAPs that send and receive data over various electromagnetic frequencies (e.g., radio frequencies) , including WAPs that support Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (e.g., 802.11g, 802.11n, and so forth) , and other standards.
Although embodiments of the present disclosure will be described hereinafter by taking a poem as an example of the content, the content generated according to the embodiment of the present disclosure may include, but is not limited to poetry, novel, essay and play. Since the process of generating the above various content is similar or the same, the method of generating content is described below by taking poetry as an example.
Fig. 2 is a flow chart of a method for generating a poem according to an embodiment of the present disclosure. Generally, a poem may include at least one segment, and the term “segment” used herein may refer to, for example, a sentence, one or several words in the poem.
As shown in Fig. 2, original material is received at step S101. In an embodiment, the original material may include a picture, an audio, a video, context or any combination thereof provided during a conversation with a chatbot. In an example, the original material may be in a form of a message which may refer to a text including at least one word. It is to be noted that the term “word” used herein throughout the specification refers to not only a word but also a Chinese character in the case the content is generated. Since the content may be generated based on various kind of original material, the process of the generation of the content may be made simpler and funnier.
In an embodiment of the present disclosure, both an image and some words may be received to generate a poem during a conversation. For example, in a raining day, the user or a chatbot may talk to another user or talk to another chatbot via the terminal device. The user may input a message including some words to express that he/she feels sad, and then take a picture of a scene in the rain. The method according to the embodiment of the present disclosure will generate a poem based on both the picture and the user’s sadness (i.e., the message expressing sad) . In another example embodiment, the user may take a picture while listening to music, and the method of the present disclosure will generate a poem based on the picture and the music. In addition, one or more pictures may be provided as the original material to generate a poem.  Therefore, more convenience and funny may be provided when generating a poem by using the method according to embodiments of the present disclosure.
In an embodiment of the present disclosure, in addition to the original material provided as described previously, different styles of poem may also be provided for the user to select. For example, the user may select a style of poem such as Crescent School Poem and Misty Poem before or after he/she input a picture (for example, take a picture or select an existing picture) . The method of the present disclosure will generate a poem based on the picture and the selected style.
At step S102, a first word set including at least a word describing affective feature of the original material is obtained. The word describing affective feature may be an adjective, such as the word “busy” for describing the noun word “city” or the word describing affective feature may be a noun, such as the word “smile” for expressing a happy mood. The affective feature of an image refers to various subjective experiences that the image evokes human’s feeling, impression even affection. Since the word describing affective feature may be obtained, the generated poem may bring more impression and touching to the reader. How to obtain the word describing affective feature of the original material will be described in detail hereinafter.
At step S103, a poem is generated by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) .
In an embodiment of the present disclosure, the generation of each segment of the poem employs a recurrent neural network language model (RNNLM) to predict a word next to the word selected for the segment. Each word of the segment of the poem may predicted sequentially by the previous word sequence:
Figure PCTCN2018092398-appb-000001
w i is the i th word and w 1: i-1 represents the preceding word sequence.
Due to the directivity of RNNLM, one may only generate forward from the existing word. To allow the word to appear in any position in a sentence, a simple idea is training a reverse version of RNNLMs (which train the corpus by a reverse ordering) , and generating backward from the existing word:
Figure PCTCN2018092398-appb-000002
However, if the forward and backward are generated separately, the result will be two independent parts without semantic connections. With a simple recursive strategy described be-low, the problem may be overcome.
Assuming < sos > and < eos > represent the start symbol and end symbol of a sentence. Also, LM forward and LM backward are the forward and backward RNNLM. The process of generating j th segment l j with j th word k j in the poem is described in Algorithm 1.
Algorithm 1 of the generation of sentence based on the RNN
1: sequence←k j
2: while <sos> 
Figure PCTCN2018092398-appb-000003
sequence or <eos> 
Figure PCTCN2018092398-appb-000004
sequence do
3: if <eos> 
Figure PCTCN2018092398-appb-000005
sequence then
4: w=argmax wP (w|sequence, LM forward)
5: sequence←sequence+w
6: if <sos> 
Figure PCTCN2018092398-appb-000006
sequence then
7: w=argmax wP (w|sequence, LM backward)
8: sequence←w+sequence
9: l j←sequence
For each segment, a forward RNN and a backward RNN are used to predict words next to a word in the first word set until two line breaks representing the beginning and the end of a segment respectively are predicted. For the next segment, the above two RNN are used to predict words next to a word in the first word set according to the word in the first word set and the previous segment.
In one embodiment of the present disclosure, the first segment of the poem may be generated based on a first word, and the second segment of the poem may be generated based on the generated first segment and a second word in the word set. Therefore, the correlation of the generated segments among the poem may be improved.
In one embodiment of the present disclosure, the poem may include more than one paragraphs each including a plurality of the segments, and a segment of at least one of the paragraphs may be predicted from a word in the first word set used in a previous paragraph. For example, if the poem includes two paragraphs, each paragraph consisting of four segments, six words may be obtained in which four words are used for generating the first paragraph. The method may reuse two words that have been used in the first paragraph and the remaining two words to generate the second paragraph. By reusing the words, the segments in different paragraphs may comprise the same words and an effect of echoing before and after may be achieved. Therefore, the correlation of the segments among the poem may be further improved.
In an embodiment of the present disclosure, after the poem is generated segment by segment, an inverted index may be established based on the segments of the generated poem and the first word set and the RNN may be trained based on the inverted index. Specifically, in the inverted index, the word for generating one segment of the poem, other words in this segment of the poem and this segment of the poem is indexed to establish a correspondence from the words to the sentence. Then, the inverted index may be feed back to the RNN for learning. In the further generation, if the word in this segment which is not the word used for generating this segment is obtained, the same segment is indexed. Therefore, the diversity of the generation of the poem may be improved.
In an embodiment of the present disclosure, after one segment of the poem is generated, the method may further include actions of performing a check on the generated words or sentences based on an N-GRAM and a skip N-GRAM model to measure whether a word is appropriate and whether two words in a sentence are semantic consistent.
In an embodiment of the present disclosure, after the poem is generated, the method may further include a step of check to determine the completeness of the generated poem.
Through these check, the generated poem will be relevant to the image content, fluent in language and coherent in semantics.
For example, in an uncompleted sentence “a green and big” , the trained RNN may predict that the word after the adjective “big” should be the words “mountain” or “trees” . However, if the sentence is generated as “a green and big sea” , since “big” is the word with highest co-occurrence with the words “mountain” or “trees” , the GRAM model may detect that the word “sea” in this sentence is not appropriate.
The process of obtaining a first word set including at least a word describing affective feature of the original material according to an embodiment of the present disclosure will be described as follows by referring to Fig. 3.
At step S301 an initial word set (also referred to as a second word set) may be extracted by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material. The two CNNs may be trained by using a great number of images, music and/or videos. The two CNNs share the same network architecture but with different parameters. Specifically, one CNN learns to describe objects by the output of noun words, and the other CNN learns to understand the affective feature by the output of adjective words. The two CNNs may be pre-trained on ImageNet and fine-tuned on noun and adjective categories, respectively. For each CNN, the extracted deep convolutional representations are denoted as W c*I, where W c denotes the overall parameters of one CNN, *denotes a set of operations of convolution, pooling and activation, and I denotes the input image. Based on deep representation, the inventor further generates a probability distribution P over the output categories shown as:
P (C|I) =f (W c*I)      (3)
where f (·) represents fully-connected layers to map convolutional features to a feature vector that could be matched with the category entries and includes a softmax layer to further transform the feature vector to probabilities. For the proposed parallel CNNs, the probability over noun and adjective categories are denoted as p n (C|I) and p a (C|I) , respectively. Categories with high probabilities are chosen to establish the word set for generating the poem.
In one embodiment of the present disclosure, for each CNN in our parallel architecture, since GoogleNet may produce the first-class performance on ImageNet competition, GoogleNet is selected as the basic network structure. Then, 272 nouns and 181 adjectives as the categories for noun and adjective CNN training, since these categories are adequate to describe common objects and affective feature from images. The training time for each CNN takes about 50 hours on a Tesla K40 GPU, with the top-one classification accuracy of 92.3%and 85.4%on the initial word set, respectively.
Then, at step S302, inappropriate words may be removed from the initial word set to generate an intermediate word set (also referred to as a third word set) . For example, in an embodiment, words with confidence score lower than a predefined threshold from the initial word set may be removed. Because of the removing, the words for generating the poem may be more accurate.
At step S303, the first word set may be obtained based on the intermediate word set. For example, in an embodiment, a word with high co-occurrence with a word in the intermediate  word set may be selected from a corpus based on at least one literary genre including modern poem, and the first word set may be obtained by adding the selected words into the intermediate word set. In addition or alternatively, a frequent word in the corpus may be also selected. For obtaining a more accurate first word set, in an embodiment of the present disclosure, a word that not only occurs frequently in the corpus and but also has high co-occurrence with a word from the third word set in the corpus may be selected so as to generate the first word set by adding the selected words into the third word set.
In an embodiment of the present disclosure, a word map including a plurality of words may be generated based on the corpus. In the word map, one of a direct or indirect connection may exist between any two of the plurality of words. A word in the third word set that is rarely used may be replaced with a word having connection with the rare word in the word map. In addition or alternatively, a word that has connection with a word in the third word set may be selected from the word map and the first word set may be obtained by adding the selected words into the intermediate word set. An example of the word map according to an embodiment of the present disclosure will be described in detail hereinafter with refence to Fig. 11.
In such a way, the first word set comprising more accurate words may be obtained. Therefore, a long accurate poem may be generated easily. Of course, in some circumstances, the initial word set or the intermediate word set themselves may be directly used as the first word set.
An example embodiment will be provided hereinafter in which a picture taken by the user will be used as the original material to generate a poem.
Fig. 4 is an example of generating a poem based on a picture according to an embodiment of the present disclosure. A process of generating a poem from the picture in Fig. 4 will be described in detail as follows. Fig. 4 illustrates a scenario where a user is talking to a chatbot while the user is on the way to work and complaining about the serious traffic in the morning. Therefore, the user may wish the chatbot to write a poem based on the picture shot by the user about the traffic.
After receiving the instruction of writing a poem and the picture which is shot by the user, the terminal device may transmit the picture to the computing device. The computing device may extract an initial word set by applying, on the picture, two parallel CNNs, wherein one CNN is for extracting the word describing affective feature and the other CNN is for extracting a word describing an object from the picture. Therefore, the initial word set thus obtained may be {city, street, traffic, road, tiny car, busy, scare, broken} . In an example, the words for generating a poem may be identified by the CNNs which are trained by using ImageNet. Wherein, the words “city” , “street” , “traffic” , “tiny car” are the nouns and the words “busy” , “scare” and “broken” belongs to the words describing affective feature.
Generally, the initial word set may comprise words not appropriate for this picture. As may be seen, the word “broken” in the initial word set is obviously not related to this traffic scenario. Therefore, to remove the irrelevant words, it is proposed to calculate the confidence scores about the above eight words in the initial word set {city, street, traffic, road, tiny car, busy, scare, broken} and remove the words with low confidence scores. As a result, only words with high confidence scores will be used in a next step. In an embodiment of present disclosure, the words with confidence scores less than a predefined value, for example 0.5, may be removed. Table 1 shows the words and their corresponding confidence scores. As may be seen, since the words “city” and “busy” have the confidence scores greater than 0.5, the two words “city” and  “busy” may be kept and other six words with low confidence scores may be removed. Therefore, the third word set (i.e. the intermediate word set) {city, busy} may be generated.
Table 1 Words and corresponding confidence scores
Words Confidence scores
City 0.61
Street 0.26
Traffic light 0.03
Road 0.03
Tiny car 0.01
Busy 0.50
Scare 0.16
Broken 0.07
However, only two words of “city” and “busy” are not sufficient to generate a poem, for example, a poem including four segments. Specifically, if only the words with high confidence scores are used to generate some segments (for example, the first two segments) of the poem and the next two segments of the poem are generated only based on its previous segments without expanding the words, the correlation of the generated segments will be decreased. Therefore, it is necessary to expand the intermediate word set into a first word set including sufficient words to generate a complete poem. Specific implementations for obtaining the first word set based on the intermediate word set will be described as follows.
Frequent word
After removing the rare and inappropriate words, words are sampled with the word distribution of the trained corpus. More frequently a word occurs in the corpus, the greater the chance the word will be selected. The nouns with highest frequency of occurrence in our corpus are life, time, and place. Applying these words may enhance both the variation and imagination of the generated poem without deviating from the theme too much.
The corpus may be a corpus trained based on a great number of literary genres, such as the modern poetry, fiction and etc.
High co-occurred word
Another implementation is to consider words with high co-occurrence with the words of the initial word set. An embodiment of the present disclosure samples the words with the distribution of co-occurrence frequency with the words of the initial word set. The more often a word co-occurs with the selected words of the initial word set, the greater the chance it will be selected to as the word for generation. Taking the picture in Fig. 4 as an example, words with highest co-occurrence with the word “city” are “place” , “child” , “heart” and “land” may be selected as the words for generation. Unlike the high frequent words, these high co-occurred words usually have more relevance with the words obtained directly from the picture. Therefore, the result may respect to more specific topics of the poems.
Table 2 shows three approaches of obtaining a first word set and corresponding average score, word irrelevant rate and segment irrelevant rate. The without_expand represents an approach of generating content by using appropriate word from the intermediate word set, the expand_freq represents an approach of generating content using frequent word, and the expand_cona represents an approach of generating content by using words with high confidence  score.
While there is no obvious difference between the average scores in Table 2, these three approaches show a totally different performance on the relevance to image query. Since without_expand only use words with high confidence score, it has the lowest word irrelevant rate. However, due to the uncontrollability of generation without words, the irrelevant rate of generated content increases dramatically. The approaches of expand_freq and expand_cona may reduce the segment irrelevant rate by enlarging the word set with additional words. While the segment irrelevant rate of expand_freq increases slightly, by considering the word co-occurrence, the method of expand_cona gains the best segment relevant rate and decreases the word irrelevant rate to 18.7%to 15.6%.
Table 2 shows the approaches of obtaining a first word set and corresponding average score, word irrelevant rate and segment irrelevant rate
Figure PCTCN2018092398-appb-000007
In an embodiment of the present disclosure, “Frequent word” and “High co-occurred word” may be combined to generate the first word set. For example, a word that is not only frequent in the corpus and but also with high co-occurrence with a word in the third word set from the corpus, is “place” . Therefore, the word “place” , which may be more relevant to the picture, may be included in the first word set.
In such a way, a first word set of {city, busy, place, smile} may be obtained. A poem may be generated based on the first word set and the RNN that is trained by using at least one literary genre including modern poetry. The generation of the poem may include generating a first segment of the poem based on one word in the first set through a forward RNN and a backward RNN and a next segment based on another word in the first word set and the previous generated segments.
How to generate each segment of the poem has been described hereinabove. Now the present disclosure will take a specific picture and a specific first word set as an example to introduce how to generate each segment of the poem with reference to Fig. 5.
Fig. 5 illustrates a schematic diagram of generating content in English according to an embodiment of the present disclosure. The first word set {city, busy, place, smile} which is generated as above will be taken as an example. The process of generating the first segment of the poem is as follows. The word in the first word set for generating a first segment may be “City” , a trained forward RNN model predicts that the next word following the word “city” with a high probability is “flow” . However, the word “city” is necessarily located at the beginning of the segment. Therefore, it is necessary to use a backward RNN model to predict the location of the word “city” in this segment. For example, if the backward RNN model may predict there is a high probability that a line break is in front of the word “city” , it is determined that the word “city” should be located at the beginning of this segment. Then the forward RNN model keeps predicting  that “slowly” , “behind” and “him” are after the word “flow” . At last, the forward RNN model predicts there is another line break after the word “him” . Since another line break is predicted, this segment of the poem is completed. Therefore, it is determined that the segment of the poem is generated. Therefore, the segment of the poem is generated as “the city flows slowly behind him” . Other segment of the poem may be generated in a similar way as described hereinabove. Finally, a poem including four segments is generated:
The city flows slowly behind him
My life is busy
That’s why we keep silent in a place no one knows
With lips curl into phony smile
As can be seen, the word “busy” and “smile” are at the end of the respective segment of the poem and the word “place” is in the middle of a segment of the poem. Therefore, through the forward RNN model and the backward RNN model, the word for generating a segment of the modern poem may be at any position of this segment of the poem, unlike the method for generating acrostic poem through machine in which the word must be located at the beginning of each segment of the acrostic poem.
Fig. 6 illustrates a schematic diagram of generating content in Chinese according to an embodiment of the present disclosure. For a Chinese poem, if the first word set is {城市, 忙碌, 地方, 生活} , if the first word in the first word set is “城市” , a trained forward RNN model predicts the next word following the word “城市” with a high probability is “在” . Then the backward RNN model predicts that there is a line break at the beginning of the word “城市” in a high probability, therefore, it is determined that the word “城市” should be at the beginning of the first segment of the poem. Then the forward RNN model keeps predicting words “在” , “他” , “身后” , “缓缓地” and “流” in order until it predicts another line break which represent that the first segment is completed. Therefore, the first segment of the poem is generated as “城市在他身后缓缓滴流” . Finally, the four segments of the Chinese poem are generated as:
城市在他身后缓缓地流
我的生活忙碌
我们才在没人知道的地方寂静
嘴边挂着虚假的笑容
In an embodiment of the present disclosure, the fluency of segments of the poem may be controlled with the RNNLM model and a recursive strategy. In multi-words and multi-segments scenario, maintaining consistency between segments of the poem may be necessary. Since a segment may be generated in two directions, using the state of RNNLM to pass the information is no longer available. An embodiment of the present disclosure may extend the input gate of the RNNLM model to two parts, one is the originally previous word input, and another is the previous segment’s information.
For the poem level network, an embodiment of the present disclosure predicts the content vector of the next segment of the poem by all the previous generated segments. For generating j th segment l j in the poem:
Figure PCTCN2018092398-appb-000008
The process of generating the next segment of the poem will be described hereinafter by taking the first segment of the poem “the city flows slowly behind him” as an example.
Fig. 7 illustrates a hierarchical poetry model including sentence level and poem level according to an embodiment of the present disclosure.
The process of the generation of the segments of the poem will be described as follows with reference to Fig. 7. In Fig. 7, w 0 represent the start of the next segment of the poem(i.e, a line break) , w 1, …w i+1 represents the words for generating the next segment of the poem. In addition, l 0 represent the start of the poem (i.e. a line of null) , l 1, …l i represents the previous generated segment of the poem.
It should be noted that Fig. 7 only illustrates the forward-version hierarchical poetry model, while the backward-version hierarchical poetry model may be modified by reversing the dotted line. Since the principle of the forward version and backward version of the hierarchical poetry models are the same, the backward-version hierarchical poetry model in Fig. 8 will not be repeated here.
This hierarchical poem model affects the generation of the next segment of the poem by using information about the previous segment s of poem. The RNN encodes a segment of the poem into a vector through a sentence decoder. When the first segment of the poem is generated, the generation of the next segment of the poem should consider not only the word for the next segment but also the generated first segment. For example, when the first segment of the poem is generated, the RNN will automatically generate a state serving as the encoding of the previous segment of the poem and then the state including the information about the previous sentence is input into the generation of the next segment of the poem.
The process of generating the first and second segments “the city flows slowly behind” and “my life is busy” will be taken as an example. First, in the sentence level LSTM, the first segment “the city flows slowly behind” is generated according to the forward RNN, the backward RNN and the word city in the first word set. Then the generated segment “the city flows slowly behind” is feed back to the poem level LSTM as l 1. After passing through a sentence encoder, l 1 is encoded as l′ 1 and input to the sentence level LSTM to serve as a predicted content to generate the next segment which should include the word busy. Finally, based on the word “busy” and the previous segment “the city flows slowly behind him” , the next segment “my life is busy” is generated in order and then “my life is busy” is response to poem level LSTM for the future generation.
For generating a Chinese poem, the above generated poem may be taken as an example. l 1 represent the generated first segment of the Chinese poem “城市在他身后缓缓地” and l j represents the generated second segment of the Chinese poem “我的生活忙碌” , l j+1 represents the generated third segment of the poem “我们才在没人知道的地方寂静” , and w i+1represents the obtained word “笑容” for the last segment of the poem. The RNN model may predict the w 1 is “嘴边” and w 2 is “虚假” , therefore, based on the previous three segment s of the poem and the word “笑容”, the last segment of the poem may be generated as “嘴边流淌着虚假的笑容” and then input back to the RNN model as an further input to generate the possible next segment of the poem. In this way, the consistency between segments of the poem may be improved.
In an embodiment of the present disclosure, after all segments of the pome are generated, an inverted index may be established according to respective segment of the generated poem and the first word set. Taking the above first segment of the generated poem “the city flows slowly behind him” as an example. After the segment “the city flows slowly behind him” is generated based on the word “city” , an inverted index in which “city” , “flow” , “slowly” , “behind” and “him” respectively correspond to the segment “the city flows slowly behind him” may be established, then the inverted index is fed back to the trained RNN for learning. Therefore, in the future, if the word is “flow” other than the word “city” , a same sentence “the city flows slowly behind him” may still be generated. Therefore, the diversity of the generated poem may be improved.
Fig. 9 is another scenario of generating a poem based on a picture according to an embodiment of the present disclosure. The generated poem includes two paragraphs each including four segments, and two segments of the second paragraph comprises words used in the first paragraph. In this scenario, a user who is travelling in New York is talking to a chatbot and requests the chatbot to write a poem according to a picture about the Time Square. By using the method according to the embodiment of the present disclosure, the poem is generated as follows.
As every city is unworthy for the countryside
I have a wonderful finish
I am a rhetorical person every night of insomnia
Hidden in the soul most lost fire
Out of the edge of the city
Wonderful
In the wind
The slightest touch
During the generation of the above poem having two paragraphs each including four segments, the poem is generated according to the word set {city, wonderful, person, fire, wind, slightest} . The first paragraph may be generated by using the first four words “city” , “wonderful” , “person” and “fire” . The second paragraph may be generated by using the words “wind” and “slightest” and reusing the words “city” and “wonderful” . By reusing the word “city” and “wonderful” which have been used in the first paragraph, consistencies among segments of the poem may be improved.
In addition or alternatively, it should be noted that the obtained word set is not limited to the word set extracted from the image. For example, since the conversation between the user and the chatbot mentions the word “wonderful” , the word “wonderful” may be selected from the conversation to be used as a word for writing the poem. In an example, the conversation between the user and the chatbot may use the voice message or text message. As can be seen from Fig. 9, the user inputs a voice message “what a wonderful day” by talking to the speaker of the terminal device, then the computing device obtains and identifies the voice message and extract the word “wonderful” as a word in the first word set to write the poem.
It should be noted that the conversation includes, but is not limited to a conversation between a user and a chatbot, a conversation between two users or between two chatbots.
It should be noted that the scenario of the present disclosure is not limited to the conversations described above. The scenarios of the present disclosure may include a scenario in which an application installed on the terminal device receives a picture and generate content according to the present disclosure.
Fig. 10 illustrates a scenario of generating a poem according to a picture in combination with several days’chatting log according to the embodiment of the present disclosure. In this scenario, on Friday, the user requested the chatbot to add the strolling on Sunday to the reminder. On Sunday, the user uploads a picture during the strolling and requests the chatbot to write a poem according to the picture. During the conversation on Friday, the word “empty” may be extracted from the chatting log between the user and the chatbot on Friday. Therefore, based on the word “empty” on Friday and words which extracted from the picture shot by the user on Sunday, the poem is generated as follows.
Wings holds rocks and water tightly
In the loneliness
Stroll the empty
The land becomes soft
Fig. 11 illustrates a scenario of using a word map to generate a poem according to an embodiment of the present disclosure. In this scenario, the user takes a picture about the sunset and request the chatbot to generate a poem including two paragraphs each including 4 segments according to the picture about the sunset. According to the method of the present disclosure, the generated word set may be {cloud, lady, heart, sky} . Obviously, four words in this word set is not sufficient to generate a complete poem. In addition, the word “lady” in this word set is already a rare word which is usually used at the beginning of the last century. Therefore, to make up for the lack of the words in the word set and to replace the rare word in the word set, the embodiment of the present disclosure proposes to establish a word map.
In the word map of Fig. 11, there is a direct or indirect connection between the word “lady” and any of the words “woman” , “girl” , “baby” , “beauty” and “cute” . The word “lady” has a same meaning as the words “woman” and “girl” , therefore a direct connection exists therebetween. The word “lady” may also be extended to a word “baby” and a direct connection exists therebetween. In addition, the word “baby” may be extended to a word “cute” , therefore an indirect connection may exist between “lady” and “cute” .
Therefore, if the word “lady” is a rare word which is not appropriate in the modern poem and more appropriate word “woman” or “girl” which is more popular and more appropriate for the modern poem may be found through the word map, the word “lady” may be replaced with the word “girl” . In this way, the diversity of the word set may be improved. Accordingly, since the word map may find the more appropriate word for the semantic environment, the generated poem may match the image more easily and touch the heart of the reader. In addition, through the word map, the words “beauty” and “cute” describing affective feature are also extended and selected for generating the poem. Therefore, through the method for generating content according to the present disclosure in combination with the word map, the poem is generated as follows.
The cloud is an unruly shock
A cute girl with beautiful hope
You don't have heart
Only the sky is like a cloud
Cloud in the dream
Miserable melody enters the beauty of my heart
But the Chinese universe
Stay in your heart
Fig. 12 is a block diagram depicting an exemplary computing device 102 according to an embodiment of the present disclosure. The computing devices may comprises one or more processors 1021 and a memory 1022 storing machine-executable instructions that when executed cause the one or more processors to perform actions comprising receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; generating content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , and wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
In an embodiment of the present disclosure, the original material may comprise at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
In an embodiment of the present disclosure, the action of obtaining a first word set comprises extracting a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing sentiment and the other is for extracting a word describing an object from the original material and obtaining the first word set based on the second word set.
In an embodiment of the present disclosure, the action of obtaining the first word set based on the second word set comprises: removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus based on at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and generating the first word set by combining the selected words and the third word set.
In an embodiment of the present disclosure, the corpus is based on at least another literary genre than modern poem and the method further comprises: generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connections exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
In an embodiment of the present disclosure, the corpus is based on at least another literary genre than modern poem and the method further comprises: generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connections exits between any two of the plurality of words; and generating the first word set by  combining the third word set with a word which has a connection with a word in the third word set according to the word map.
In an embodiment of the present disclosure, the computing device further comprises: establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
In an embodiment of the present disclosure, the content includes more than one paragraph each including a plurality of segments and a segment of a paragraph comprises a word used in a previous paragraph.
Fig. 13 is a block diagram depicting an exemplary system 400 for generating content according to the embodiment of the present disclosure may comprises a computing device 401 and a terminal device 402. In an embodiment of the present disclosure, the computing device 401 may communicate with the terminal device 402 via a network 403. In an embodiment of the present disclosure, the computing device 401 comprising a memory and one or more processor coupled to the memory; the a terminal device 402 configured to transmit original material to the computing device; wherein the memory stores machine-executable instructions that when executed, cause the one or more processors to: receive the original material from the computing device; obtain a first word set including at least a word describing sentiment implied by the original material; generate content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem; and transmit the generated content to the terminal device.
In an embodiment of the present disclosure, the original material comprises at least one of a picture, an audio, a video, and information from a conversation with a chatbot.
In an embodiment of the present disclosure, the obtaining a first word set comprises: obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; obtaining the first word set based on the second word set.
In an embodiment of the present disclosure, the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
In conclusion, the present disclosure proposes a method for generating content including at least receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; and generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
In an embodiment of the present disclosure, the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
In an embodiment of the present disclosure, the obtaining a first word set comprises  obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and obtaining the first word set based on the second word set.
In an embodiment of the present disclosure, the obtaining the first word set based on the second word set comprises removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set: and obtaining the first word set by adding the selected words into the third word set.
In an embodiment of the present disclosure, the corpus is trained by at least a literary genre besides modern poem and the method further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
In an embodiment of the present disclosure, the method further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least another literary genre besides modern poem; removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; and obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
In an embodiment of the present disclosure, further comprising establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
In an embodiment of the present disclosure, the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
The present disclosure proposes a computing device, comprising one or more processors and a memory storing machine-executable instructions that when executed by the one or more processors, cause the one or more processors to perform actions comprising: receiving original material; obtaining a first word set including at least a word describing affective feature of the original material; obtaining a first word set including at least a word describing affective feature of the original material; generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
In an embodiment of the present disclosure, the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
In an embodiment of the present disclosure, the action of obtaining a first word set comprises obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing  affective feature and the other is for extracting a word describing an object from the original material; and obtaining the first word set based on the second word set.
In an embodiment of the present disclosure, the action of obtaining the first word set based on the second word set comprises removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and obtaining the first word set by adding the selected words into the third word set.
In an embodiment of the present disclosure, the corpus is trained by at least a literary genre besides modern poem and the actions further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
In an embodiment of the present disclosure, the actions further comprises generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least another literary genre besides modern poem; removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
In an embodiment of the present disclosure, the actions further comprise establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
In an embodiment of the present disclosure, the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
The present disclosure proposes a system for generating content, comprising a computing device comprising a memory and one or more processor coupled to the memory, a terminal device configured to transmit original material to the computing device; wherein the memory stores machine-executable instructions that when executed, cause the one or more processors to: receive the original material from the computing device; obtain a first word set including at least a word describing sentiment implied by the original material; generate content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem; transmit the generated content to the terminal device.
In an embodiment of the present disclosure, the original material comprises at least one of a picture, an audio, a video, and information from a conversation with a chatbot.
In an embodiment of the present disclosure, the obtaining a first word set comprises: obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and  the other is for extracting a word describing an object from the original material; obtaining the first word set based on the second word set.
In an embodiment of the present disclosure, the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising: predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
In the following, the method for generating content according to embodiments of the present disclosure will be compared with two existing methods for generating content. The first existing method is Image2caption, which generating English sentences from a given image. Taking the picture in Fig. 10 as an example, the content generated by using Image2caption is:
A tree on a rocky hillside
Next to a tree
The second existing method is an application designed by CTRIP. By using the application, a traditional poetry may be generated. Still taking the picture in Fig. 10 as an example, the traditional poem generated by using the application of CTRIP from this picture is:
Unknown grass in the second month of spring
Petals and green woods honest in noon
Want to ask the bee where to go
Pair of butterfly shadows want to walk through path
The poem generated by using the method in the present disclosure from the picture in Fig. 10 is generated as:
Wings holds rocks and water tightly
In the loneliness
Stroll the empty
The land becomes soft
After generating the above poems, to evaluate the performance of the above three methods and obtain rich feedback without user bias, 22 assessors from variety career fields are chosen, including: 1) 8 female users and 14 male users, 2) 13 users with bachelor’s degree and 1 user with master or higher degree, 3) 11 users prefer traditional poetry, 10 users prefer modern poetry and 1 user prefers neither.
After a series of evaluations, the method for generating content according to embodiments of the present disclosure gain higher relevance to images by leveraging successful word extraction and expansion and is easier to stimulates interesting ideas and generates emotional resonance.
A person skilled in the art can understand that the operations, the methods, the steps in the flows, the measures, the technical solutions discussed in the embodiments of the present disclosure can be replaced, changed, combined or deleted. Further, the operations, the methods,  the other steps in the flows, the measures, the technical solutions discussed in the invention can also be replaced, changed, rearranged, combined or deleted. Further, prior arts having the operations, the methods, the other steps in the flows, the measures, the technical solutions discussed in the invention can also be replaced, changed, rearranged, combined or deleted.
Various features described herein, e.g., methods, device, system and the like, can be realized using any combination of dedicated components and/or programmable processors and/or other programmable devices. The various processes described herein can be implemented on the same processor or different processors in any combination. Where components are described as configured to perform certain operations, such configuration can be accomplished, e.g., by designing electronic circuits to perform the operation, by programming programmable electronic circuits (such as microprocessors) to perform the operation, or any combination thereof. Further, while the embodiments described above may refer to specific hardware and software components, those skilled in the art will appreciate that different combinations of hardware and/or software components may also be used and that operations described as being implemented in hardware might also be implemented in software or vice versa.
Machine programs incorporating various features described herein may be encoded and stored on various machine readable storage media; suitable media include magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk) , flash memory, and other non-transitory media. Machine readable media encoded with the program code may be packaged with a compatible electronic device, or the program code may be provided separately from electronic devices (e.g., via Internet download or as a separately packaged machine-readable storage medium) .
Thus, although the disclosure has been described with respect to specific embodiments, it will be appreciated that the disclosure is intended to cover all modifications and equivalents within the scope of the following claims.

Claims (20)

  1. A method for generating content including at least one segment, comprising:
    receiving original material;
    obtaining a first word set including at least a word describing affective feature of the original material; and
    generating the content by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) ,
    wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  2. The method according to claim 1, wherein the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
  3. The method according to claim 1, wherein the obtaining a first word set comprises:
    obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and
    obtaining the first word set based on the second word set.
  4. The method according to claim 3, wherein, the obtaining the first word set based on the second word set comprises:
    removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set;
    selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and
    obtaining the first word set by adding the selected words into the third word set.
  5. The method according to claim 4, wherein the corpus is trained by at least a literary genre besides modern poem and the method further comprises:
    generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and
    replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
  6. The method according to claim 3, wherein the method further comprises:
    generating a word map including a plurality of words based on a corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least a literary genre besides modern poem;
    removing words with confidence score lower than a predefined threshold from the second word  set to generate a third word set; and
    obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
  7. The method according to claim 1, further comprising:
    establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
  8. The method according to claim 1, wherein the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising:
    predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  9. A computing device, comprising one or more processors and a memory storing machine-executable instructions that when executed by the one or more processors, cause the one or more processors to perform actions comprising:
    receiving original material;
    obtaining a first word set including at least a word describing affective feature of the original material; and
    generating content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) ,
    wherein the RNN has been pre-trained by using at least one literary genre including modern poem.
  10. The computing device according to claim 10, wherein the original material comprises at least one of a picture, an audio, a video, and context of a conversation with a chatbot.
  11. The computing device according to claim 10, wherein the action of obtaining a first word set comprises:
    obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and
    obtaining the first word set based on the second word set.
  12. The computing device according to claim 11, wherein, the action of obtaining the first word set based on the second word set comprises:
    removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set;
    selecting, from a corpus trained by at least one literary genre including modern poem, a word with high co-occurrence with a word in the third word set; and
    obtaining the first word set by adding the selected words into the third word set.
  13. The computing device according to claim 12, wherein the corpus is trained by at least a literary genre besides modern poem and the actions further comprises:
    generating a word map including a plurality of words based on the corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words; and
    replacing a rare word in the third word set with a word having a connection with the rare word according to the word map.
  14. The computing device according to claim 11, wherein the actions further comprise:
    generating a word map including a plurality of words based on a corpus, wherein one of a direct and an indirect connection exits between any two of the plurality of words and the corpus is trained by at least a literary genre besides modern poem;
    removing words with confidence score lower than a predefined threshold from the second word set to generate a third word set; and
    obtaining the first word set by adding, into the third word set, a word which has a connection with a word in the third word set according to the word map.
  15. The computing device according to claim 9, the actions further comprise:
    establishing an inverted index according to the generated content and the first word set; and training RNN based on the inverted index.
  16. The computing device according to claim 9, wherein the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising:
    predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
  17. A system for generating content, comprising:
    a computing device comprising a memory and one or more processor coupled to the memory; and
    a terminal device configured to transmit original material to the computing device;
    wherein the memory stores machine-executable instructions that when executed, cause the one or more processors to:
    receive the original material from the computing device;
    obtain a first word set including at least a word describing sentiment implied by the original material; and
    generate content including at least one segment by predicting each segment from a respective word in the first word set through a Recurrent Neural Network (RNN) , wherein the RNN has been pre-trained by using at least one literary genre including modern poem; and
    transmit the generated content to the terminal device.
  18. The system according to claim 17, wherein the original material comprises at least one of a picture, an audio, a video, and information from a conversation with a chatbot.
  19. The system according to claim 17, wherein the obtaining a first word set comprises:
    obtaining a second word set by applying, on the original material, two parallel convolutional neural networks (CNN) , of which one is for extracting the word describing affective feature and the other is for extracting a word describing an object from the original material; and
    obtaining the first word set based on the second word set.
  20. The system according to claim 17, wherein the content includes more than one paragraphs each including a plurality of the segments, and the generating the content comprising:
    predicting a segment of at least one of the paragraphs from a word in the first word set used in a previous paragraph.
PCT/CN2018/092398 2018-06-22 2018-06-22 Method, computing device and system for generating content WO2019242001A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/092398 WO2019242001A1 (en) 2018-06-22 2018-06-22 Method, computing device and system for generating content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/092398 WO2019242001A1 (en) 2018-06-22 2018-06-22 Method, computing device and system for generating content

Publications (1)

Publication Number Publication Date
WO2019242001A1 true WO2019242001A1 (en) 2019-12-26

Family

ID=68983092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/092398 WO2019242001A1 (en) 2018-06-22 2018-06-22 Method, computing device and system for generating content

Country Status (1)

Country Link
WO (1) WO2019242001A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199502A (en) * 2020-10-26 2021-01-08 网易(杭州)网络有限公司 Emotion-based poetry sentence generation method and device, electronic equipment and storage medium
CN112487153A (en) * 2020-12-17 2021-03-12 广州华多网络科技有限公司 Lyric content generating method and corresponding device, equipment and medium
CN112989812A (en) * 2021-03-04 2021-06-18 中山大学 Distributed poetry generation method based on cloud data center
US11516158B1 (en) 2022-04-20 2022-11-29 LeadIQ, Inc. Neural network-facilitated linguistically complex message generation systems and methods

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226547A (en) * 2013-04-28 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for producing verse for picture
CN106227714A (en) * 2016-07-14 2016-12-14 北京百度网讯科技有限公司 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
KR101767651B1 (en) * 2016-03-15 2017-08-14 중앙대학교 산학협력단 Device and method for predicting human emotion
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226547A (en) * 2013-04-28 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for producing verse for picture
KR101767651B1 (en) * 2016-03-15 2017-08-14 중앙대학교 산학협력단 Device and method for predicting human emotion
CN106227714A (en) * 2016-07-14 2016-12-14 北京百度网讯科技有限公司 A kind of method and apparatus obtaining the key word generating poem based on artificial intelligence
CN107480132A (en) * 2017-07-25 2017-12-15 浙江工业大学 A kind of classic poetry generation method of image content-based

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199502A (en) * 2020-10-26 2021-01-08 网易(杭州)网络有限公司 Emotion-based poetry sentence generation method and device, electronic equipment and storage medium
CN112199502B (en) * 2020-10-26 2024-03-15 网易(杭州)网络有限公司 Verse generation method and device based on emotion, electronic equipment and storage medium
CN112487153A (en) * 2020-12-17 2021-03-12 广州华多网络科技有限公司 Lyric content generating method and corresponding device, equipment and medium
CN112487153B (en) * 2020-12-17 2024-04-05 广州华多网络科技有限公司 Lyric content generation method and corresponding device, equipment and medium thereof
CN112989812A (en) * 2021-03-04 2021-06-18 中山大学 Distributed poetry generation method based on cloud data center
CN112989812B (en) * 2021-03-04 2023-05-02 中山大学 Cloud data center-based distributed poetry generation method
US11516158B1 (en) 2022-04-20 2022-11-29 LeadIQ, Inc. Neural network-facilitated linguistically complex message generation systems and methods

Similar Documents

Publication Publication Date Title
CN112616063B (en) Live broadcast interaction method, device, equipment and medium
WO2019242001A1 (en) Method, computing device and system for generating content
CN110085244B (en) Live broadcast interaction method and device, electronic equipment and readable storage medium
US9037956B2 (en) Content customization
US20130262127A1 (en) Content Customization
CN112738557A (en) Video processing method and device
KR20190034035A (en) Method for providing vicarious experience service using virtual reality based on role-playing and bigdata
CN111200555A (en) Chat message display method, electronic device and storage medium
CN115348458A (en) Virtual live broadcast control method and system
CN117173497B (en) Image generation method and device, electronic equipment and storage medium
CN113516972B (en) Speech recognition method, device, computer equipment and storage medium
CN110990632B (en) Video processing method and device
KR20200022225A (en) Method, apparatus and computer program for generating cartoon data
CN117474748A (en) Image generation method and device, electronic equipment and storage medium
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
CN117115303A (en) Method, system, computing device and storage medium for content generation
CN112672207A (en) Audio data processing method and device, computer equipment and storage medium
CN113407766A (en) Visual animation display method and related equipment
US20230027035A1 (en) Automated narrative production system and script production method with real-time interactive characters
US20220414315A1 (en) Machine learning-powered framework to transform overloaded text documents
CN112562733A (en) Media data processing method and device, storage medium and computer equipment
CN112036155A (en) Text generation method, text generation device and computer readable storage medium
US10810382B2 (en) Automated conversion of vocabulary and narrative tone
CN116913278B (en) Voice processing method, device, equipment and storage medium
US11954794B2 (en) Retrieval of augmented parameters for artificial intelligence-based characters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18923534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18923534

Country of ref document: EP

Kind code of ref document: A1