WO2019174186A1 - 诗歌自动生成方法、装置、计算机设备及存储介质 - Google Patents

诗歌自动生成方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019174186A1
WO2019174186A1 PCT/CN2018/102383 CN2018102383W WO2019174186A1 WO 2019174186 A1 WO2019174186 A1 WO 2019174186A1 CN 2018102383 W CN2018102383 W CN 2018102383W WO 2019174186 A1 WO2019174186 A1 WO 2019174186A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
verse
keywords
poetry
lstm model
Prior art date
Application number
PCT/CN2018/102383
Other languages
English (en)
French (fr)
Inventor
张琛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019174186A1 publication Critical patent/WO2019174186A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, an apparatus, a computer device, and a storage medium for automatically generating poetry.
  • the present application provides a method, device, computer device and storage medium for automatically generating poetry, which aims to solve the problem that the poetry generation technology based on the statistical translation model in the prior art requires that the keyword must be in the specified topic vocabulary, if Failure to meet the requirements results in the inability to generate poetry, or the difficulty of ensuring that the relevance of the entire poem and subject terms is not guaranteed.
  • the present application provides a method for automatically generating poems, including:
  • the keywords are sequentially acquired, and the obtained current keyword and the previous keyword are input into the LSTM model for encoding and decoding, and a verse corresponding to the keyword is obtained; wherein the LSTM model is a long and short memory neural network;
  • the verses corresponding to the keywords are filled into the preset poetry body template to obtain poetry.
  • an automatic poetry generating apparatus including:
  • a keyword obtaining unit configured to acquire the entered attribute information of the person, and extract a preset number of keywords according to the attribute information of the person;
  • a poetry type obtaining unit configured to obtain a poetry generation type according to the number of keywords and the part of speech of the keyword
  • the model input unit is configured to sequentially acquire keywords, input the obtained current keyword and the previous keyword into the LSTM model for encoding and decoding, and obtain a verse corresponding to the keyword one by one; wherein the LSTM model is a long and short memory neural network;
  • the poetry combination unit is used to fill the verses corresponding to the keywords one by one into the preset poetry body template to obtain poetry.
  • the present application further provides a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor implementing the computer program
  • a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor implementing the computer program
  • the present application also provides a storage medium, wherein the storage medium stores a computer program, the computer program comprising program instructions, the program instructions, when executed by a processor, causing the processor to execute the application Any of the poem automatic generation methods described in any one of the above.
  • the application provides a method, a device, a computer device and a storage medium for automatically generating poetry.
  • the method realizes automatically determining keywords according to personnel attribute information, and generates logically relevant verses according to keyword intelligence.
  • FIG. 1 is a schematic flow chart of a method for automatically generating poems according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-flow of a method for automatically generating poems according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of another sub-flow of a method for automatically generating poems according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of another sub-flow of a method for automatically generating poems according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of another sub-flow of a method for automatically generating poems according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of an apparatus for automatically generating poems according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a subunit of an automatic poetry generating apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of another subunit of an apparatus for automatically generating poems according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of another subunit of an apparatus for automatically generating poems according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of another subunit of an apparatus for automatically generating poems according to an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • FIG. 1 is a schematic flow chart of a method for automatically generating poems according to an embodiment of the present application.
  • the method is applied to terminals such as desktop computers, laptop computers, and tablet computers.
  • the method includes steps S101 to S104.
  • the person attribute information when it is necessary to generate poetry according to the specified cue information (for example, using poetry as a corporate culture propaganda), the person attribute information can be used as a data basis for acquiring a keyword.
  • the personnel attribute information includes basic employee information (such as name, gender, affiliated team, etc.), job content characteristics information (such as development, system analysis, testing, etc.), personality characteristics information (cheerful, dull, etc.).
  • four keywords may be generated according to the person attribute information (for example, four keywords randomly acquired are names, affiliated teams, development, and cheerful);
  • the information included is merely an example for specific implementation, and is not limited to the information listed above, that is, the person attribute information may be information for any individual.
  • the step S101 includes the following sub-steps:
  • S1012 Obtain N numbers by using a random algorithm, and obtain person attribute information corresponding to the N numbers one by one as a keyword; wherein N is a specified number of preset keywords.
  • N numbers are selected by a random algorithm, and corresponding keywords are selected according to the serial numbers, so that a specified number of keywords can be quickly selected, and the user does not need to manually select them.
  • the step S102 includes the following sub-steps:
  • the poetry generation type is set as a five-word quatrain or a five-word verse;
  • the proportion of the adjectives in the keyword needs to be preset, for example, if the ratio of the adjectives is less than 50%, the poetry generation type is set to five quatrains or five verses; the adjectives If the ratio is greater than or equal to 50%, the poetry generation type is set to seven quatrains or seven verses.
  • the proportion of adjectives can be adjusted according to actual needs.
  • the keywords will be automatically classified according to the proportion of adjectives in the keywords.
  • the sub-categories are respectively generated into five-word quatrains, five-word verses, or seven-character quatrains and seven-character verses.
  • the viterbi algorithm when judging the part of speech of a keyword, the viterbi algorithm is used. When determining the part of speech of a word by the viterbi algorithm, the following operations are required:
  • the pre-word is the proportion of part-of-speech [x] in the total situation (for example, when the post-word is fixed to [verb], the pre-word [noun] appears in all times [x][verb] Ratio], recorded to fshift[TYPE_NUM][TYPE_NUM;
  • S103 sequentially acquiring keywords, inputting the obtained current keyword and the previous keyword into the LSTM model for encoding and decoding, and obtaining a verse corresponding to the keyword one by one; wherein the LSTM model is a long and short memory neural network.
  • the step S103 includes the following sub-steps:
  • S1031 Acquire a first keyword, input the first keyword into an LSTM model for encoding and decoding, and obtain a first verse corresponding to the first keyword;
  • S1032 Obtain a second keyword, input the second keyword and the first keyword into the LSTM model for encoding and decoding, and obtain a second verse corresponding to the second keyword;
  • S1033 Obtain a third keyword, input the third keyword and the second keyword into the LSTM model for encoding and decoding, and obtain a third verse corresponding to the third keyword;
  • S1034 Obtain a fourth keyword, input the fourth keyword and the third keyword into the LSTM model for encoding and decoding, and obtain a fourth verse corresponding to the fourth keyword.
  • the keywords are sequentially acquired, they are sequentially input into the LSTM model that has been trained according to the historical data, and the verses corresponding to the keywords are generated one by one, and the poems can be combined to form the final poem.
  • the LSTM model is input for processing.
  • the LSTM model is a long and short memory neural network.
  • the full name of LSTM is Long Short-Term Memory, which is a time recurrent neural network.
  • LSTM is suitable for processing and predicting important events with very long intervals and delays in time series.
  • the LSTM model can encode keywords and perform pre-processing of automatic poetry.
  • the key to LSTM is the Cell State, which can be thought of as a horizontal line across the top of the entire cell.
  • the cell state is similar to a conveyor belt, which passes directly through the entire chain, with only a few small linear interactions.
  • the information carried on the cell state can easily flow without changing.
  • the LSTM has the ability to add or delete information to the cell state.
  • the above capabilities are controlled by the structure of the gate, ie the gate can selectively pass information, wherein the gate structure It consists of a Sigmoid neural network layer and an element-level multiplication operation.
  • the Sigmoid layer outputs values between 0 and 1, each value indicating whether the corresponding partial information should pass. A value of 0 means that information is not allowed to pass, and a value of 1 means that all information is passed.
  • An LSTM has three gates to protect and control the state of the cell.
  • the LSTM includes at least three doors, as follows:
  • Input gate which determines how much of the network input is saved to the unit state at the current time
  • Input gate which determines how much of the unit state is output to the current output value of the LSTM.
  • the LSTM model is a threshold loop unit, and the model of the threshold loop unit is as follows:
  • W z , W r , W are the weighted parameter values obtained by training, x t is the input, h t-1 is the implicit state, z t is the update state, and r t is the reset signal. Is a new memory corresponding to the implicit state h t-1 , h t is the output, ⁇ () is the sigmoid function, and tanh () is the hyperbolic tangent function.
  • the keyword is encoded by the first layer LSTM structure, and then converted into a sequence consisting of hidden states. After continuing to decode, the verses after the initial processing can be obtained.
  • the method before the step S101, the method further includes:
  • the overall framework of the LSTM model is fixed. You only need to set the parameters of each layer such as input layer, hidden layer and output layer to get the model. The parameters of each layer such as input layer, hidden layer and output layer can be tested. Get the optimal parameter values multiple times. For example, if there are 10 nodes in the hidden layer node, and the value of each node can be taken from 1 to 10, then 100 combinations will be tried to get 100 training models, and then the 100 models will be trained with a large amount of data, according to the accuracy. Rate to obtain an optimal training model.
  • the parameters such as the node value corresponding to the optimal training model are the optimal parameters (it can be understood that W z , W r , W in the above GRU model is the optimal here). parameter). Apply the optimal training model to the scheme as the LSTM model, which ensures that the generated verses are more logically related.
  • the first layer LSTM structure is used to encode the input keywords word by word.
  • “Cui” is green, defined as an adjective
  • "Liu” is a willow, a noun
  • " ⁇ ” is a cyan, an adjective
  • " ⁇ ” is the sky, a noun
  • white is an adjective.
  • the first layer of LSTM structure completes encoding the input keywords, and then uses the second layer LSTM structure to decode into verses, and enters the LSTM model as the analysis result by using the verse “two yellow ⁇ ⁇ ⁇ , one line of egrets on the sky”.
  • the LSTM model automatically learns the arrangement of adjectives and nouns in the seven-sentence or seven-sentence poems, and automatically supplements the position of the hollowed-out sentence by learning, and can be satisfied by using similar training methods for multiple trainings.
  • the actual use of the required LSTM model is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, and nouns in the seven-sentence or seven-sentence poems, and automatically supplements the position of the hollowed-out sentence by learning, and can be satisfied by using similar training methods for multiple trainings.
  • the actual use of the required LSTM model is not limited.
  • the step S1032 includes the following sub-steps:
  • the first keyword and the first keyword are input into the first layer LSTM structure in the LSTM model to obtain a sequence consisting of an implicit state.
  • Beam Search algorithm that is, a cluster search algorithm
  • Beam Search algorithm that is, a cluster search algorithm
  • S104 Filling a poem corresponding to the keyword one by one into a preset template of the poetry text to obtain poetry.
  • the poetry text template is sequentially filled in the order of generation, and poetry can be obtained.
  • the method automatically determines the keyword according to the attribute information of the person, and generates a logically relevant verse according to the keyword intelligence.
  • FIG. 6 is a schematic block diagram of an apparatus for automatically generating poems according to an embodiment of the present application.
  • the poetry automatic generating device 100 can be installed in a desktop computer, a tablet computer, a laptop computer, or the like.
  • the poem automatic generation apparatus 100 includes a keyword acquisition unit 101, a poetry type acquisition unit 102, a model input unit 103, and a poem combination unit 104.
  • the keyword obtaining unit 101 is configured to acquire the entered attribute information of the person, and extract a preset number of keywords according to the attribute information of the person.
  • the keyword obtaining unit 101 includes the following subunits:
  • the numbering unit 1011 is configured to perform the ascending numbering of the entered personnel attribute information in the order of information entry;
  • the random obtaining unit 1012 is configured to obtain N numbers by using a random algorithm, and obtain the person attribute information corresponding to the N numbers one by one as a keyword; wherein N is a specified number of preset keywords.
  • N numbers are selected by a random algorithm, and corresponding keywords are selected according to the serial numbers, so that a specified number of keywords can be quickly selected, and the user does not need to manually select them.
  • the poetry type obtaining unit 102 is configured to obtain a poetry generation type according to the number of keywords and the part of speech of the keyword.
  • the poetry type acquisition unit 102 includes the following subunits:
  • the ratio calculation unit 1021 is configured to obtain a ratio of the adjectives in the specified number of keywords
  • the first type selecting unit 1022 is configured to set the poetry generation type to a five-word quatrain or a five-word verse if the ratio of the adjective is less than 50%;
  • the second type selection unit 1023 is configured to set the poetry generation type to a seven-character quatrain or a seven-speech poem if the ratio of the adjectives is greater than or equal to 50%.
  • the model input unit 103 is configured to sequentially acquire keywords, input the obtained current keyword and the previous keyword into the LSTM model for encoding and decoding, and obtain a verse corresponding to the keyword one by one; wherein the LSTM model is a long and short memory neural network .
  • the model input unit 103 includes the following subunits:
  • the first verse generating unit 1031 is configured to acquire the first keyword, input the first keyword into the LSTM model for encoding and decoding, and obtain a first verse corresponding to the first keyword;
  • the second verse generating unit 1032 is configured to acquire the second keyword, and input the second keyword and the first keyword into the LSTM model for encoding and decoding, to obtain a second verse corresponding to the second keyword;
  • the third verse generating unit 1033 is configured to acquire the third keyword, and input the third keyword and the second keyword into the LSTM model for encoding and decoding, to obtain a third verse corresponding to the third keyword;
  • the fourth verse generating unit 1034 is configured to acquire the fourth keyword, and input the fourth keyword and the third keyword into the LSTM model for encoding and decoding, to obtain a fourth verse corresponding to the fourth keyword.
  • the poem automatic generating apparatus 100 further includes:
  • the model training unit is configured to put a plurality of keywords in the corpus into the first layer LSTM structure, and put the verses corresponding to the keywords into the second layer LSTM structure, and perform training to obtain the LSTM model.
  • the second verse generating unit 1032 includes the following subunits:
  • the first input unit 10321 is configured to input the second keyword and the first keyword into the first layer LSTM structure in the LSTM model to obtain a sequence consisting of an implicit state;
  • a second input unit 10322 configured to input a sequence consisting of an implicit state into a second layer LSTM structure in the LSTM model for decoding, to obtain a word sequence of the verse;
  • the verse concatenation unit 10323 is configured to serially connect the word sequences of the verses to obtain a second verse.
  • the verse combination unit 104 is configured to fill the verses corresponding to the keywords one by one into the preset verse text template to obtain poetry.
  • the device automatically determines the keyword according to the attribute information of the person, and generates a verse with strong logical relevance according to the keyword intelligence.
  • the above-described automatic poetry generating apparatus can be implemented in the form of a computer program which can be run on a computer device as shown in FIG.
  • FIG. 11 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 device can be a terminal.
  • the terminal can be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected by a system bus 501, wherein the memory can include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform an automatic poetry generation method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, which when executed by the processor 502, causes the processor 502 to perform a method of automatic poetry generation.
  • the network interface 505 is used for network communication, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device 500 to which the solution of the present application is applied, and a specific computer device. 500 may include more or fewer components than shown, or some components may be combined, or have different component arrangements.
  • the processor 502 is configured to run a computer program 5032 stored in the memory to implement the following functions: acquiring the entered personnel attribute information, and extracting a preset specified number of keywords according to the personnel attribute information; The number of words and the part of speech of the keyword correspond to the type of poetry generation; the keywords are obtained sequentially, and the obtained current keyword and the previous keyword are input into the LSTM model for encoding and decoding, and the verse corresponding to the keyword is obtained one by one;
  • the LSTM model is a long and short memory neural network; the verses corresponding to the keywords are filled into the preset poetry body template to obtain poetry.
  • the processor 502 further performs the following operations: numbering the entered personnel attribute information in ascending order according to the order of information entry; acquiring N numbers by a random algorithm, and obtaining one-to-one correspondence with the N numbers Person attribute information is used as a keyword; where N is the specified number of preset keywords.
  • the processor 502 further performs the operations of: obtaining a ratio of the adjectives in the specified number of keywords; if the ratio of the adjectives is less than 50%, setting the poetry generation type to a five-word quatrain or a five-word verse; If the proportion of adjectives is greater than or equal to 50%, the type of poetry is set to seven quatrains or seven verses.
  • the processor 502 further performs the following operations: acquiring the first keyword, inputting the first keyword into the LSTM model for encoding and decoding, obtaining a first verse corresponding to the first keyword, and acquiring the second keyword And inputting the second keyword and the first keyword into the LSTM model for encoding and decoding, obtaining a second verse corresponding to the second keyword; acquiring the third keyword, inputting the third keyword and the second keyword
  • the LSTM model is encoded and decoded to obtain a third verse corresponding to the third keyword; the fourth keyword is obtained, and the fourth keyword and the third keyword are input into the LSTM model for encoding and decoding, and the fourth keyword is obtained. Corresponding fourth verse; wherein the specified number is 4.
  • the processor 502 further performs the following operations: encoding the second keyword and the first keyword into the first layer LSTM structure in the LSTM model to obtain a sequence consisting of an implicit state; The composed sequence is input to the second layer LSTM structure in the LSTM model for decoding, and the word sequence of the verse is obtained; the word sequence of the verse is sequentially connected to obtain the second verse.
  • the embodiment of the computer device shown in FIG. 11 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or fewer components than illustrated. Or combine some parts, or different parts.
  • the computer device may include only a memory and a processor. In such an embodiment, the structure and function of the memory and the processor are the same as those of the embodiment shown in FIG. 11, and details are not described herein again.
  • the processor 502 may be a central processing unit (CPU), and the processor 502 may also be another general-purpose processor, a digital signal processor (DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • a storage medium in another embodiment of the present application, is provided.
  • the storage medium can be a computer readable storage medium.
  • the storage medium stores a computer program, wherein the computer program includes program instructions.
  • the poem automatic generation method of the embodiment of the present application is implemented when the program instruction is executed by the processor.
  • the storage medium may be an internal storage unit of the aforementioned device, such as a hard disk or a memory of the device.
  • the storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (SMC), a secure digital (SD) card, and a flash memory card. (Flash Card), etc.
  • the storage medium may also include both an internal storage unit of the device and an external storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种诗歌自动生成方法、装置、计算机设备及存储介质。该方法包括:获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;根据关键词的个数及关键词的词性,对应获取诗歌生成类型;依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。该方法实现了根据人员属性信息自动确定关键词,并根据关键词智能的生成逻辑关联性强的诗句。

Description

诗歌自动生成方法、装置、计算机设备及存储介质
本申请要求于2018年3月15日提交中国专利局、申请号为201810213456.4、申请名称为“诗歌自动生成方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种诗歌自动生成方法、装置、计算机设备及存储介质。
背景技术
目前人工智能已实现自动作诗,大多数都是基于开头一个关键词生成一段诗,即生成五言或七言诗。也即现有技术中常采用基于统计翻译模型的诗歌生成技术,其存在如下缺陷:1)主题词只能生成第一句诗歌,整首诗和主题词的相关性难以保证;2)主题词必须在指定的主题词词表中,无法处理人名地名等不可能在词表中出现的主题词。
发明内容
本申请提供了一种诗歌自动生成方法、装置、计算机设备及存储介质,旨在解决现有技术中采用基于统计翻译模型的诗歌生成技术是要求主题词必须在指定的主题词词表中,若不满足要求则导致无法生成诗歌,或是整首诗和主题词的相关性难以保证的问题。
第一方面,本申请提供了一种诗歌自动生成方法,其包括:
获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
第二方面,本申请提供了一种诗歌自动生成装置,其包括:
关键词获取单元,用于获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
诗歌类型获取单元,用于根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
模型输入单元,用于依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
诗句组合单元,用于将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
第三方面,本申请又提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本申请提供的任一项所述的诗歌自动生成方法。
第四方面,本申请还提供了一种存储介质,其中所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行本申请提供的任一项所述的诗歌自动生成方法。
本申请提供一种诗歌自动生成方法、装置、计算机设备及存储介质。该方法实现了根据人员属性信息自动确定关键词,并根据关键词智能的生成逻辑关联性强的诗句。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种诗歌自动生成方法的示意流程图;
图2是本申请实施例提供的一种诗歌自动生成方法的子流程示意图;
图3为本申请实施例提供的一种诗歌自动生成方法的另一子流程示意图;
图4为本申请实施例提供的一种诗歌自动生成方法的另一子流程示意图;
图5为本申请实施例提供的一种诗歌自动生成方法的另一子流程示意图;
图6为本申请实施例提供的一种诗歌自动生成装置的示意性框图;
图7为本申请实施例提供的一种诗歌自动生成装置的子单元示意性框图;
图8为本申请实施例提供的一种诗歌自动生成装置的另一子单元示意性框图;
图9为本申请实施例提供的一种诗歌自动生成装置的另一子单元示意性框图;
图10为本申请实施例提供的一种诗歌自动生成装置的另一子单元示意性框图;
图11为本申请实施例提供的一种计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1是本申请实施例提供的一种诗歌自动生成方法的示意流程图。该方法应用于台式电脑、手提电脑、平板电脑等终端中。如图1所示,该方法包括步骤S101~S104。
S101、获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词。
在本实施例中,当需要根据指定提示信息生成诗歌(例如将诗歌用作企业文化宣传)时,可以将人员属性信息作为获取关键词的数据基础。例如,人员属性信息包括员工基本信息(如名字、性别、所属团队等)、工作内容特征信息(如开发、系统分析、测试等)、性格特征信息(开朗、沉闷等)。例如,若指定个数的关键词为4个,则可以根据所述人员属性信息生成4个关键词(例如随机获取的4个关键词为姓名、所属团队、开发、开朗);上述人员属性信息包括的各项信息只是用于具体实施时的举例,并不局限于上述列举的信息,也即人员属性信息可以是针对任何个人的信息。
在一实施例中,如图2所示,所述步骤S101包括以下子步骤:
S1011、对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;
S1012、通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
在本实施例中,通过随机算法选定N个序号,并根据序号对应选定关键词,能实现快速挑选出指定个数的关键词,无需用户手工操作而选定。
S102、根据关键词的个数及关键词的词性,对应获取诗歌生成类型。
在一实施例中,如图3所示,所述步骤S102包括以下子步骤:
S1021、获取指定个数的关键词中形容词所占比率;
S1022、若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
S1023、若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
在本实施例中,关键词中形容词所占的比例需进行预先设定,例如初始设定为若形容词的比率小于50%,则将诗歌生成类型置为五言绝句或五言律诗;形容词的比率大于或等于50%,则将诗歌生成类型置为七言绝句或七言律诗。通过设定形容词所占的比例,在所输入的关键词中形容词偏多的情况下,生成七言绝句或七言律诗,以使所生成的诗歌更加内容丰富。其中,形容词所占的比例可根据实际需求进行调整,在完成形容词所占的比例的设定后,将自动根据关键词中形容词所占的比例,对所输入的关键词进行分类,并根据所分类别分别生成得到五言绝句、五言律诗,或七言绝句、七言律诗。
其中,对关键词的词性进行判断时,通过viterbi算法。通过viterbi算法判断词语的词性时,需要进行如下操作:
1)准备一个语料库,包含已经正确标注了词性的大量语句;
2)对语料库的内容进行统计,得到以下数据:
所有可能的词性;
所有出现的词语;
每个词语以不同词性出现的次数;
记录句首词为不同词性的次数;
记录句子中任一两种词性相邻的次数(如遇到:″看电影″这个句子,则有[动词][名词]的值加一)。
3)针对前面统计的结果,进行分析计算,得到以下结果:
计算每类词性作为句首出现的比例(比如:动词为句首,占所有不同词性为句首中的比例),记录到fstart[TYPE_NUM];
计算后词固定为词性[n]时,前词为词性[x]占总情况的比例(如:后词固定为[动词]时,前词[名词]出现的次数占所有[x][动词]的比例),记录到fshift[TYPE_NUM][TYPE_NUM;
计算每一个词作为不同类词性出现的次数,占所有该类词出现总数的比例(如:″中国″作为名词出现的次数占所有名词的比例),记录到ffashe[TYPE_NUM][60000];
4)输入句子进行词性标注;
输入的句子中每个词有多个词性,需要选出合适的一个组合。比如输入句子″希望″+″的″+″田野″,分别有词性个数p1、p2、p3、p4,则可能的词性组合数为:S=p1*p2*p3*p4,需要从S个不同组合中选出最优的一个组合。
S103、依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络。
在一实施例中,如图4所示,当限定指定个数为4时,所述步骤S103包括以下子步骤:
S1031、获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
S1032、获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;
S1033、获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;
S1034、获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
依序获取了关键词后,将其按顺序输入至已根据历史数据训练得到的LSTM模型,生成与关键词一一对应的诗句,诗句组合后就能形成最终的诗歌。
在获取了关键词后,输入LSTM模型进行处理。LSTM模型即长短记忆神经网络,其中LSTM的全称是Long Short-Term Memory,是一种时间递归神经 网络,LSTM适合于处理和预测时间序列中间隔和延迟非常长的重要事件。通过LSTM模型能对关键词进行编码,进行自动作诗的前序处理。
为了更清楚的理解LSTM模型,下面对LSTM模型进行介绍。
LSTM的关键是元胞状态(Cell State),其可以视为横穿整个元胞顶部的水平线。元胞状态类似于传送带,它直接穿过整个链,同时只有一些较小的线性交互。元胞状态上承载的信息可以很容易地流过而不改变,LSTM有能力对元胞状态添加或者删除信息,上述能力通过门的结构来控制,即门可以选择性让信息通过,其中门结构是由一个Sigmoid神经网络层和一个元素级相乘操作组成。Sigmoid层输出0~1之间的值,每个值表示对应的部分信息是否应该通过。0值表示不允许信息通过,1值表示让所有信息通过。一个LSTM有3个门,来保护和控制元胞状态。
LSTM中至少包括三个门,分别如下:
1)遗忘门,其决定了上一时刻的单元状态有多少保留到当前时刻;
2)输入门,其决定了当前时刻网络的输入有多少保存到单元状态;
3)输入门,其决定了单元状态有多少输出到LSTM的当前输出值。
在一实施例中,所述LSTM模型为门限循环单元,所述门限循环单元的模型如下:
z t=σ(W z·[h t-1,x t])
r t=σ(W r·[h t-1,x t])
Figure PCTCN2018102383-appb-000001
Figure PCTCN2018102383-appb-000002
其中,W z、W r、W是训练得到的权值参数值,x t是输入,h t-1是隐含状态,z t是更新状态,r t是重置信号,
Figure PCTCN2018102383-appb-000003
是与隐含状态h t-1对应的新记忆,h t是输出,σ()是sigmoid函数,tanh()是双曲正切函数。
关键词通过了第一层LSTM结构进行编码,就转化成隐含状态组成的序列,对其继续进行解码就能获取初次处理后的诗句。
在一实施例中,所述步骤S101之前还包括:
S101a、将语料库中的多个关键词置入第一层LSTM结构,并将关键词对应的诗句置入第二层LSTM结构,进行训练得到LSTM模型。
LSTM模型的整体框架是固定的,只需要设置其输入层、隐藏层、输出层等 各层的参数,就可以得到模型,其中设置输入层、隐藏层、输出层等各层的参数可以通过实验多次来得到最优的参数值。譬如,隐藏层节点有10个节点,那每个节点的数值可以从1取到10,那么就会尝试100种组合来得到100个训练模型,然后用大量数据去训练这100个模型,根据准确率等来得到一个最优的训练模型,这个最优的训练模型对应的节点值等参数就是最优参数(可以理解为上述GRU模型中的W z、W r、W就为此处的最优参数)。用最优的训练模型来应用到本方案中作为LSTM模型,这样能确保所生成的诗句逻辑关联性更强。
例如,在通过历史数据训练得到LSTM模型的过程中,当输入包含“翠柳”、“青天”、“白色”的关键词后,第一层LSTM结构用来对输入的关键词逐词编码,其中“翠”即是绿色,定义为形容词;“柳”是柳树,为名词;“青”即是青色,为形容词;“天”是天空,为名词;白色为形容词。第一层LSTM结构完成对输入的关键词进行编码,之后通过第二层LSTM结构用来解码为诗句,将诗句“两个黄鹂鸣翠柳,一行白鹭上青天”作为解析结果输入LSTM模型,则LSTM模型自动学习形容词和名词在七言绝句或七言律诗的语句中的排列方式,并通过学习对语句中空缺的位置进行自动补充,通过采用类似类似的训练方式进行多次训练,可得到满足实际使用需求的LSTM模型。
在一实施例中,如图5所示,所述步骤S1032包括以下子步骤:
S10321、将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
S10322、将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
S10323、将诗句的字词序列依次串接,得到第二诗句。
在本实施例中,采用Beam Search算法(Beam Search算法即集束搜索算法),来解码隐含状态组成的序列,其具体过程如下:
1)获取隐含状态组成的序列中概率最大的语句作为诗句的语句序列中的初始位语句;
2)将初始位语句中的每个字与词表中的字进行组合得到第一次组合后序列,获取第一次组合后序列中概率最大的语句作第一次更新后序列;重复上述过程直至检测到隐含状态组成的序列中的每一字与词表中的终止符组合时停止,最终输出诗句的语句序列。
S104、将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
在本实施例中,生成了第一诗句至第N诗句后,按照生成的先后顺序依次填充至诗歌正文模板,即可得到诗歌。
可见,该方法实现了根据人员属性信息自动确定关键词,并根据关键词智能的生成逻辑关联性强的诗句。
本申请实施例还提供一种诗歌自动生成装置,该诗歌自动生成装置用于执行前述任一项诗歌自动生成方法。具体地,请参阅图6,图6是本申请实施例提供的一种诗歌自动生成装置的示意性框图。诗歌自动生成装置100可以安装于台式电脑、平板电脑、手提电脑、等终端中。
如图6所示,诗歌自动生成装置100包括关键词获取单元101、诗歌类型获取单元102、模型输入单元103及诗句组合单元104。
关键词获取单元101,用于获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词。
在一实施例中,如图7所示,所述关键词获取单元101包括以下子单元:
编号单元1011,用于对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;
随机获取单元1012,用于通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
在本实施例中,通过随机算法选定N个序号,并根据序号对应选定关键词,能实现快速挑选出指定个数的关键词,无需用户手工操作而选定。
诗歌类型获取单元102,用于根据关键词的个数及关键词的词性,对应获取诗歌生成类型。
在一实施例中,如图8所示,所述诗歌类型获取单元102包括以下子单元:
比率计算单元1021,用于获取指定个数的关键词中形容词所占比率;
第一类型选择单元1022,用于若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
第二类型选择单元1023,用于若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
模型输入单元103,用于依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM 模型为长短记忆神经网络。
在一实施例中,如图9所示,当限定指定个数为4时,所述模型输入单元103包括以下子单元:
第一诗句生成单元1031,用于获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
第二诗句生成单元1032,用于获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;
第三诗句生成单元1033,用于获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;
第四诗句生成单元1034,用于获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
在一实施例中,所述诗歌自动生成装置100还包括:
模型训练单元,用于将语料库中的多个关键词置入第一层LSTM结构,并将关键词对应的诗句置入第二层LSTM结构,进行训练得到LSTM模型。
在一实施例中,如图10所示,所述第二诗句生成单元1032包括以下子单元:
第一输入单元10321,用于将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
第二输入单元10322,用于将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
诗句串接单元10323,用于将诗句的字词序列依次串接,得到第二诗句。
诗句组合单元104,用于将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
可见,该装置实现了根据人员属性信息自动确定关键词,并根据关键词智能的生成逻辑关联性强的诗句。
上述诗歌自动生成装置可以实现为一种计算机程序的形式,该计算机程序可以在如图11所示的计算机设备上运行。
请参阅图11,图11是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500设备可以是终端。该终端可以是平板电脑、笔记本电脑、台式电脑、个人数字助理等电子设备。
参阅图11,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032包括程序指令,该程序指令被执行时,可使得处理器502执行一种诗歌自动生成方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种诗歌自动生成方法。
该网络接口505用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现如下功能:获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;根据关键词的个数及关键词的词性,对应获取诗歌生成类型;依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
在一实施例中,处理器502还执行如下操作:对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
在一实施例中,处理器502还执行如下操作:获取指定个数的关键词中形容词所占比率;若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
在一实施例中,处理器502还执行如下操作:获取第一关键词,将第一关 键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句;其中所述指定个数为4。
在一实施例中,处理器502还执行如下操作:将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;将诗句的字词序列依次串接,得到第二诗句。
本领域技术人员可以理解,图11中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图11所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行时实现本申请实施例的诗歌自动生成方法。
所述存储介质可以是前述设备的内部存储单元,例如设备的硬盘或内存。所述存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描 述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种诗歌自动生成方法,其特征在于,包括:
    获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
    根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
    依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
    将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
  2. 根据权利要求1所述的诗歌自动生成方法,其特征在于,所述获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词,包括:
    对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;
    通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
  3. 根据权利要求1所述的诗歌自动生成方法,其特征在于,所述根据关键词的个数及关键词的词性,对应获取诗歌生成类型,包括:
    获取指定个数的关键词中形容词所占比率;
    若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
    若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
  4. 根据权利要求1所述的诗歌自动生成方法,其特征在于,所述指定个数为4;
    所述依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句,包括:
    获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
    获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;
    获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码 和解码,得到与第三关键词对应的第三诗句;
    获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
  5. 根据权利要求4所述的诗歌自动生成方法,其特征在于,所述将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句,包括:
    将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
    将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
    将诗句的字词序列依次串接,得到第二诗句。
  6. 一种诗歌自动生成装置,其特征在于,包括:
    关键词获取单元,用于获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
    诗歌类型获取单元,用于根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
    模型输入单元,用于依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
    诗句组合单元,用于将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
  7. 根据权利要求6所述的诗歌自动生成装置,其特征在于,所述诗歌类型获取单元,包括:
    比率计算单元,用于获取指定个数的关键词中形容词所占比率;
    第一类型选择单元,用于若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
    第二类型选择单元,用于若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
  8. 根据权利要求6所述的诗歌自动生成装置,其特征在于,所述所述诗歌类型获取单元,包括:
    比率计算单元,用于获取指定个数的关键词中形容词所占比率;
    第一类型选择单元,用于若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
    第二类型选择单元,用于若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
  9. 根据权利要求6所述的诗歌自动生成装置,其特征在于,所述指定个数为4;
    所述模型输入单元,包括:
    第一诗句生成单元,用于获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
    第二诗句生成单元,用于获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;
    第三诗句生成单元,用于获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;
    第四诗句生成单元,用于获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
  10. 根据权利要求9所述的诗歌自动生成装置,其特征在于,所述第二诗句生成单元,包括:
    第一输入单元,用于将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
    第二输入单元,用于将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
    诗句串接单元,用于将诗句的字词序列依次串接,得到第二诗句。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现以下步骤:
    获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
    根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
    依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行 编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
    将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词,包括:
    对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;
    通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述根据关键词的个数及关键词的词性,对应获取诗歌生成类型,包括:
    获取指定个数的关键词中形容词所占比率;
    若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
    若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
  14. 根据权利要求11所述的计算机设备,其特征在于,所述指定个数为4;
    所述依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句,包括:
    获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
    获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句;
    获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;
    获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句,包括:
    将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
    将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
    将诗句的字词序列依次串接,得到第二诗句。
  16. 一种存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行以下操作:
    获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词;
    根据关键词的个数及关键词的词性,对应获取诗歌生成类型;
    依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句;其中LSTM模型为长短记忆神经网络;
    将与关键词一一对应的诗句填充至预设的诗歌正文模板,得到诗歌。
  17. 根据权利要求16所述的存储介质,其特征在于,所述获取所录入的人员属性信息,并根据人员属性信息提取预设指定个数的关键词,包括:
    对所录入的人员属性信息按信息录入的先后顺序进行升序的编号;
    通过随机算法获取N个编号,并获取与N个编号一一对应的人员属性信息并作为关键词;其中N为预设的关键词的指定个数。
  18. 根据权利要求16所述的存储介质,其特征在于,所述根据关键词的个数及关键词的词性,对应获取诗歌生成类型,包括:
    获取指定个数的关键词中形容词所占比率;
    若形容词所占比率小于50%,将诗歌生成类型置为五言绝句或五言律诗;
    若形容词所占比率大于或等于50%,将诗歌生成类型置为七言绝句或七言律诗。
  19. 根据权利要求16所述的存储介质,其特征在于,所述指定个数为4;
    所述依序获取关键词,将获取的当前关键词及上一关键词输入LSTM模型进行编码和解码,得到与关键词一一对应的诗句,包括:
    获取第一关键词,将第一关键词输入LSTM模型进行编码和解码,得到与第一关键词对应的第一诗句;
    获取第二关键词,将第二关键词及第一关键词均输入LSTM模型进行编码 和解码,得到与第二关键词对应的第二诗句;
    获取第三关键词,将第三关键词及第二关键词均输入LSTM模型进行编码和解码,得到与第三关键词对应的第三诗句;
    获取第四关键词,将第四关键词及第三关键词均输入LSTM模型进行编码和解码,得到与第四关键词对应的第四诗句。
  20. 根据权利要求19所述的存储介质,其特征在于,所述将第二关键词及第一关键词均输入LSTM模型进行编码和解码,得到与第二关键词对应的第二诗句,包括:
    将第二关键词及第一关键词均输入LSTM模型中的第一层LSTM结构进行编码,得到隐含状态组成的序列;
    将隐含状态组成的序列输入至LSTM模型中的第二层LSTM结构进行解码,得到诗句的字词序列;
    将诗句的字词序列依次串接,得到第二诗句。
PCT/CN2018/102383 2018-03-15 2018-08-27 诗歌自动生成方法、装置、计算机设备及存储介质 WO2019174186A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810213456.4 2018-03-15
CN201810213456.4A CN108415893B (zh) 2018-03-15 2018-03-15 诗歌自动生成方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019174186A1 true WO2019174186A1 (zh) 2019-09-19

Family

ID=63131514

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102383 WO2019174186A1 (zh) 2018-03-15 2018-08-27 诗歌自动生成方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN108415893B (zh)
WO (1) WO2019174186A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738061A (zh) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 古诗词生成方法、装置、设备及存储介质
CN111444679A (zh) * 2020-03-27 2020-07-24 北京小米松果电子有限公司 诗词生成方法及装置、电子设备、存储介质
CN112036192A (zh) * 2020-09-25 2020-12-04 北京小米松果电子有限公司 古诗词生成方法、装置及存储介质
CN112989812A (zh) * 2021-03-04 2021-06-18 中山大学 一种基于云数据中心分布式诗歌生成方法
CN113420555A (zh) * 2021-06-18 2021-09-21 沈阳雅译网络技术有限公司 一种基于软约束的诗词自动生成方法
CN115310426A (zh) * 2022-07-26 2022-11-08 乐山师范学院 诗歌生成方法、装置、电子设备及可读存储介质
CN116628256A (zh) * 2023-05-22 2023-08-22 杭州晨星创文网络科技有限公司 一种用于数据库平台的诗词分类方法及系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415893B (zh) * 2018-03-15 2019-09-20 平安科技(深圳)有限公司 诗歌自动生成方法、装置、计算机设备及存储介质
CN109582952B (zh) * 2018-10-31 2022-09-02 腾讯科技(深圳)有限公司 诗歌生成方法、装置、计算机设备和介质
CN110263150B (zh) * 2019-03-05 2023-10-31 腾讯科技(深圳)有限公司 文本生成方法、装置、计算机设备及存储介质
CN111950255B (zh) * 2019-05-17 2023-05-30 腾讯数码(天津)有限公司 诗词生成方法、装置、设备及存储介质
CN110852086B (zh) * 2019-09-18 2022-02-08 平安科技(深圳)有限公司 基于人工智能的古诗词生成方法、装置、设备及存储介质
CN111368514B (zh) * 2019-12-10 2024-04-19 爱驰汽车有限公司 模型训练及古诗生成方法、古诗生成装置、设备和介质
CN111709229B (zh) * 2020-06-16 2024-09-17 平安科技(深圳)有限公司 基于人工智能的文本生成方法、装置、计算机设备和介质
CN111859916B (zh) * 2020-07-28 2023-07-21 中国平安人寿保险股份有限公司 古诗关键词提取、诗句生成方法、装置、设备及介质
CN112101006B (zh) * 2020-09-14 2024-08-30 中国平安人寿保险股份有限公司 一种诗歌生成方法、装置、计算机设备及存储介质
CN112784599B (zh) * 2020-12-23 2024-05-10 北京百度网讯科技有限公司 诗句的生成方法、装置、电子设备和存储介质
CN113312448B (zh) * 2021-04-02 2022-11-08 新大陆数字技术股份有限公司 一种诗歌生成方法、系统及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955964A (zh) * 2016-06-13 2016-09-21 北京百度网讯科技有限公司 一种自动生成诗歌的方法和装置
CN106227714A (zh) * 2016-07-14 2016-12-14 北京百度网讯科技有限公司 一种基于人工智能的获取生成诗词的关键词的方法和装置
CN106528858A (zh) * 2016-11-29 2017-03-22 北京百度网讯科技有限公司 歌词生成方法及装置
CN108415893A (zh) * 2018-03-15 2018-08-17 平安科技(深圳)有限公司 诗歌自动生成方法、装置、计算机设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229684B (zh) * 2017-05-11 2021-05-18 合肥美的智能科技有限公司 语句分类方法、系统、电子设备、冰箱及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105955964A (zh) * 2016-06-13 2016-09-21 北京百度网讯科技有限公司 一种自动生成诗歌的方法和装置
CN106227714A (zh) * 2016-07-14 2016-12-14 北京百度网讯科技有限公司 一种基于人工智能的获取生成诗词的关键词的方法和装置
CN106528858A (zh) * 2016-11-29 2017-03-22 北京百度网讯科技有限公司 歌词生成方法及装置
CN108415893A (zh) * 2018-03-15 2018-08-17 平安科技(深圳)有限公司 诗歌自动生成方法、装置、计算机设备及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738061A (zh) * 2019-10-17 2020-01-31 北京搜狐互联网信息服务有限公司 古诗词生成方法、装置、设备及存储介质
CN110738061B (zh) * 2019-10-17 2024-05-28 北京搜狐互联网信息服务有限公司 古诗词生成方法、装置、设备及存储介质
CN111444679A (zh) * 2020-03-27 2020-07-24 北京小米松果电子有限公司 诗词生成方法及装置、电子设备、存储介质
CN111444679B (zh) * 2020-03-27 2024-05-24 北京小米松果电子有限公司 诗词生成方法及装置、电子设备、存储介质
CN112036192A (zh) * 2020-09-25 2020-12-04 北京小米松果电子有限公司 古诗词生成方法、装置及存储介质
CN112989812A (zh) * 2021-03-04 2021-06-18 中山大学 一种基于云数据中心分布式诗歌生成方法
CN112989812B (zh) * 2021-03-04 2023-05-02 中山大学 一种基于云数据中心分布式诗歌生成方法
CN113420555A (zh) * 2021-06-18 2021-09-21 沈阳雅译网络技术有限公司 一种基于软约束的诗词自动生成方法
CN115310426A (zh) * 2022-07-26 2022-11-08 乐山师范学院 诗歌生成方法、装置、电子设备及可读存储介质
CN116628256A (zh) * 2023-05-22 2023-08-22 杭州晨星创文网络科技有限公司 一种用于数据库平台的诗词分类方法及系统

Also Published As

Publication number Publication date
CN108415893B (zh) 2019-09-20
CN108415893A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
WO2019174186A1 (zh) 诗歌自动生成方法、装置、计算机设备及存储介质
US11120801B2 (en) Generating dialogue responses utilizing an independent context-dependent additive recurrent neural network
US11816442B2 (en) Multi-turn dialogue response generation with autoregressive transformer models
US11900056B2 (en) Stylistic text rewriting for a target author
US10824815B2 (en) Document classification using attention networks
US10909327B2 (en) Unsupervised learning of interpretable conversation models from conversation logs
Yi et al. Text style transfer via learning style instance supported latent space
US20170308790A1 (en) Text classification by ranking with convolutional neural networks
WO2022188584A1 (zh) 基于预训练语言模型的相似语句生成方法和装置
US10482185B1 (en) Methods and arrangements to adjust communications
CN111680159A (zh) 数据处理方法、装置及电子设备
US10372763B2 (en) Generating probabilistic annotations for entities and relations using reasoning and corpus-level evidence
US11636272B2 (en) Hybrid natural language understanding
US20210133279A1 (en) Utilizing a neural network to generate label distributions for text emphasis selection
WO2021034376A1 (en) Example based entity extraction, slot filling and value recommendation
WO2021238337A1 (zh) 用于实体标注的方法和装置
KR20210158815A (ko) 트리플 샘플 생성 방법, 장치, 전자 기기 및 기록 매체
US11562150B2 (en) Language generation method and apparatus, electronic device and storage medium
CN116450813B (zh) 文本关键信息提取方法、装置、设备以及计算机存储介质
US11587567B2 (en) User utterance generation for counterfactual analysis and improved conversation flow
US11475335B2 (en) Cognitive data preparation for deep learning model training
US9581644B2 (en) Digital IC simulation
CN111860862A (zh) 执行学习模型的分层简化
Dong et al. Chinese NER by Span-Level Self-Attention
CN114841162B (zh) 文本处理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909751

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909751

Country of ref document: EP

Kind code of ref document: A1