WO2019218660A1 - 文章生成 - Google Patents

文章生成 Download PDF

Info

Publication number
WO2019218660A1
WO2019218660A1 PCT/CN2018/121310 CN2018121310W WO2019218660A1 WO 2019218660 A1 WO2019218660 A1 WO 2019218660A1 CN 2018121310 W CN2018121310 W CN 2018121310W WO 2019218660 A1 WO2019218660 A1 WO 2019218660A1
Authority
WO
WIPO (PCT)
Prior art keywords
topic
sentence
dimension vector
vector
article
Prior art date
Application number
PCT/CN2018/121310
Other languages
English (en)
French (fr)
Inventor
富饶
徐娟
汪非易
侯培旭
于志安
汤彪
张弓
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2019218660A1 publication Critical patent/WO2019218660A1/zh
Priority to US17/097,405 priority Critical patent/US11288454B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Definitions

  • the invention relates to artificial intelligence technology, in particular to an article generation method, device and computer storage medium.
  • the current article publishing platform mainly obtains articles in the following three ways: 1. manual writing; 2. external source crawling article headlines; 3. generated by template stitching form.
  • Chinese Patent Publication No. CN106874248A discloses an artificial intelligence-based article generating method and apparatus, which comprises: selecting a template base in a template library after pre-establishing a template library according to an article corpus; Searching according to the object described in each paragraph of the target basic framework and the fields involved, obtaining the content of each field corresponding to the object, and correspondingly filling each searched field content in each paragraph position in the target basic frame. Get the main body of the article; finally, according to the target title of the main body of the article obtained in the title library, and the main body of the article, the article is generated by splicing. Since the basic framework is used to indicate the objects in the corresponding category, the objects described in each paragraph and the fields involved in the paragraph, the article can be automatically generated after the base frame is filled according to the field contents in the preset database.
  • One of the objects of the present invention is to achieve automatic writing of articles based on user needs.
  • a method for generating an article includes: mining a content source based on demand information input by a user; and extracting at least one topic dimension vector from the mined content source by using a specific topic generation model; Extracting each of the topic dimension vectors, performing topic sentence mining on the content source according to the topic dimension vector, obtaining a topic sentence corresponding to the topic dimension vector; and pairing the corresponding to the at least one topic dimension vector
  • the topic sentence is spliced and combined to generate an article that meets the demand information.
  • the mining the content source based on the demand information input by a user comprises: parsing the demand information, obtaining an article description object and/or condition information; mining and describing the article A point of interest that matches the object and/or condition information; the user-originated content corresponding to the point of interest is determined as the content source.
  • the parsing the demand content comprises at least one of: segmenting the demand information; performing part of speech analysis on the demand information; performing named entity identification on the demand information; Semantic normalization of the demand information.
  • determining user-originated content corresponding to a POI (Point of Interest) as content includes: acquiring a plurality of interests that match the article description object and/or condition information a user rating of each of the points of interest in the point; selecting, from the plurality of points of interest, a target point of interest that satisfies a specific screening condition according to a user rating of each point of interest; user-originated content corresponding to the point of interest (UGC, User Generated Content) is determined as the content source.
  • URC User Generated Content
  • the at least one topic dimension vector is extracted from the content source by using a specific topic generation model, including: generating a model by the specific topic, and user-originated content included from the content source Extracting at least one topic description word; converting the extracted topic description word into the at least one topic dimension vector by using the first word vector conversion model.
  • the topic sentence mining is performed on the content source according to the topic dimension vector, and the topic sentence corresponding to the topic dimension vector is obtained, including: original content of the user in the content source.
  • the cosine similarity between the sentence vector of the candidate topic sentence and the topic dimension vector is calculated, including: performing word segmentation processing on the candidate topic sentence to obtain the candidate topic a word segmentation result of the sentence; converting the word segmentation result into a sentence vector by a second word vector conversion model; calculating a cosine similarity between the converted sentence vector and the subject dimension vector.
  • the splicing and compositing the topic sentences corresponding to the at least one topic dimension vector includes: corresponding to each of the topic dimensions according to a theme core word and/or a specific sentiment analysis algorithm
  • the topic sentences of the vector are screened; the topic sentences retained by the screening are spliced and synthesized.
  • the topic sentence corresponding to the topic dimension vector is filtered according to the topic core word, including: for the topic sentence corresponding to the topic dimension vector, the topic sentence Performing dependency parsing analysis to obtain a topic core word of the topic sentence; determining whether the topic core word of the topic sentence belongs to a topic dimension vector corresponding to the topic sentence; if yes, retaining the topic sentence; otherwise, filtering The topic sentence.
  • the topic sentence corresponding to the topic dimension vector is filtered according to a specific sentiment analysis algorithm, including: using a specific sentiment analysis for any topic sentence corresponding to the topic dimension vector
  • the algorithm determines an emotional tendency score of the topic sentence; determines whether the emotion of the topic sentence is positive according to the emotional tendency score of the topic sentence; if yes, retains the topic sentence; otherwise, filters the topic sentence.
  • the splicing and compositing the topic sentence corresponding to the at least one topic dimension vector comprises: determining a degree of thought relevance between the topic sentence and the corresponding topic dimension vector And selecting a core topic sentence from the topic sentences corresponding to each of the topic dimension vectors according to the degree of thought relevance between the topic sentence and the corresponding topic dimension vector, to splicing the core topic sentence synthesis.
  • the degree of thought relevance between the topic sentence and the corresponding topic dimension vector is determined, including: through a recurrent neural network (RNN) and/or a keyword extraction algorithm (TextRank), determining the degree of thought relevance between the topic sentence and the corresponding topic dimension vector.
  • RNN recurrent neural network
  • TextRank keyword extraction algorithm
  • the method before the splicing synthesis of the topic sentences of each of the topic dimension vectors, the method further comprises: using a Matching Rating (MMR) to obtain the corresponding correspondence
  • MMR Matching Rating
  • the method further comprises: selecting picture information associated with the at least one topic dimension vector from a specific picture library; fusing the picture information with the generated article To form an article of mixed pictures.
  • the method further comprises: finding a specific object attribute and/or sentence corpus that matches the generated content of the article or the demand information; utilizing the specific one found Object grammar and/or sentence corpus, generating article titles through specific template rules.
  • an article generating apparatus includes: a content source mining device for mining a content source based on demand information input by a user; and a topic dimension vector extracting device for generating a model by using a specific theme, Extracting at least one topic dimension vector from the content source; the topic sentence mining device is configured to perform topic sentence mining on the content source according to the topic dimension vector for each of the extracted topic dimension vectors, and obtain corresponding
  • the topic sentence splicing generating means is configured to perform splicing and compositing the topic sentences corresponding to the at least one topic dimension vector to generate an article that meets the requirement information.
  • the content source mining apparatus includes: a parsing module, configured to parse the requirement information, obtain an article description object and/or condition information; and a mining module, configured to mine and describe the article A point of interest that matches the object and/or condition information; a determination module for determining user-originated content corresponding to the point of interest as a content source.
  • the parsing module parsing the demand information includes at least one of: segmenting the demand information; performing part of speech analysis on the demand information; and naming the demand information Entity recognition; and semantic normalization of the demand information.
  • the determining module is further configured to: after mining, by the mining module, a plurality of points of interest that match the article description object and/or condition information, acquiring a user rating of each of the points of interest; selecting, according to the user rating of each of the points of interest, a target point of interest that satisfies a specific screening condition; and corresponding to the target point of interest User-generated content is determined as the content source.
  • the subject dimension vector extracting apparatus includes: an extracting module, configured to generate at least one topic description from user-originated content included in the content source by using the specific topic generating model a word conversion module, configured to convert the extracted topic description words into the at least one topic dimension vector by a first word vector conversion model.
  • the topic sentence mining device includes: a clause processing module, configured to perform a clause and/or a filtering operation on the user original content in the content source, to obtain the theme dimension One or more candidate topic sentences corresponding to the vector; a calculation module, configured to calculate a cosine similarity between the sentence vector of each of the candidate topic sentences and the topic dimension vector; and a determining module, configured to calculate The candidate topic sentence whose cosine similarity meets a specific threshold is determined as a topic sentence corresponding to the topic dimension vector.
  • the calculation module is further configured to: perform word segmentation processing on the candidate topic sentence to obtain a word segmentation result corresponding to the candidate topic sentence; and use the second word vector conversion model to The word segmentation result is converted into a sentence vector; the cosine similarity between the converted sentence vector and the subject dimension vector is calculated.
  • the device further includes a topic sentence screening device, and the topic sentence screening device is configured to perform splicing synthesis on a topic sentence corresponding to the topic dimension vector by the topic sentence splicing generating device Previously, the topic sentences corresponding to each of the subject dimension vectors are filtered according to a topic core word and/or a specific sentiment analysis algorithm.
  • the topic sentence screening device is configured to: perform dependency syntax analysis on the topic sentence for any topic sentence corresponding to the topic dimension vector, to obtain the topic sentence a subject core word; determining whether the topic core word of the topic sentence belongs to the topic dimension vector corresponding to the topic sentence; if yes, retaining the topic sentence; otherwise, filtering the topic sentence;
  • the topic sentence screening device is configured to determine an emotional tendency score of the topic sentence by using a specific sentiment analysis algorithm for any one topic sentence corresponding to the topic dimension vector;
  • the sentiment orientation of the topic sentence is determined whether the emotion of the topic sentence is positive; if so, the topic sentence is retained; otherwise, the topic sentence is filtered.
  • the topic sentence splicing generating apparatus is further configured to: before the splicing and compositing a topic sentence corresponding to each of the topic dimension vectors, determine the topic sentence and the corresponding a degree of thought relevance between the topic dimension vectors; selecting a core topic sentence from the topic sentences corresponding to each of the topic dimension vectors according to the degree of thought relevance between the topic sentence and the corresponding topic dimension vector, Splicing and synthesizing the core topic sentences.
  • the topic sentence splicing generating apparatus is further configured to determine, by a cyclic neural network RNN and a keyword extraction algorithm TextRank, an idea between the topic sentence and the corresponding topic dimension vector Correlation.
  • the topic sentence splicing generating apparatus is further configured to: after the splicing and compositing the topic sentence of each of the topic dimension vectors, use an MMR method to obtain the obtained The subject sentence corresponding to each of the subject dimension vectors is subjected to deduplication processing.
  • the device further includes a graphic fusion device; the graphic fusion device is configured to select, from a specific image library, picture information associated with the at least one topic dimension vector; The picture information is fused with the generated article to form a picture-mixed article.
  • the device further includes an article title generating device, and the article title generating device is configured to search for a specific object attribute matching the generated content of the article or the demand information And/or sentence corpus; use the specific object attribute and/or sentence corpus found to generate the article title through a specific template rule.
  • an article generating apparatus comprising: one or more processors; a memory; a program stored in the memory, when executed by the one or more processors, the program The processor is caused to perform the method as described above.
  • a computer readable storage medium storing a program, when the program is executed by a processor, causes the processor to perform the method as described above.
  • the embodiment of the present disclosure effectively implements the user-based requirement.
  • the article is automatically written, and the article generation cycle is short, the cost is low, and the quality is excellent.
  • FIG. 1 is a flow chart showing an article generating method according to an embodiment of the present invention
  • FIG. 2 is a flowchart showing an implementation of mining a content source based on user input input information according to an embodiment of the present invention
  • FIG. 3 is a view showing a display effect of an article generated based on artificial intelligence using an example of the application example of “Panjia Liuzhou snail powder”;
  • FIG. 4 is a schematic diagram showing the structure of an article generating device according to an embodiment of the present invention.
  • FIG. 5 shows a schematic diagram of an article generating device according to an embodiment of the present invention
  • FIG. 6 shows a schematic diagram of a computer-readable storage medium based on artificial intelligence-based article generation in accordance with an embodiment of the present invention.
  • FIG. 1 is a flow chart showing an article generating method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
  • Operation 101 mining a content source based on the demand information input by the user.
  • Operation 102 extracting at least one topic dimension vector from the mined content source by using a specific topic generation model.
  • the topic sentences corresponding to the at least one topic dimension vector are spliced and combined to generate an article that meets the requirement information.
  • the user in the process of generating an article that satisfies the user's needs, the user only needs to input the demand information to the terminal, so that the terminal can automatically perform the steps of operations 101-104 in response to the user-entered demand information.
  • the terminal may be a fixed electronic device such as a personal computer (PC), and may also be a portable electronic device such as a personal digital assistant (PAD), a laptop computer, or a tablet computer. It can also be a smart mobile terminal such as a smart phone.
  • PC personal computer
  • PAD personal digital assistant
  • laptop computer a laptop computer
  • tablet computer a tablet computer
  • smart mobile terminal such as a smart phone.
  • FIG. 2 is a flowchart showing an implementation of mining a content source based on user input input information according to an embodiment of the present invention.
  • the mining the content source based on the user input requirement information includes: an operation 1011, parsing the requirement information, and obtaining an article description object and/or condition information; 1012: Mining a point of interest that matches the article description object and/or condition information; operation 1013, determining user-originated content corresponding to the point of interest as a content source.
  • the demand information is any form of information input by the user.
  • the demand information may include information in different combinations of brand names, merchant names, recommended dishes, and geographic locations.
  • the user's input information to the terminal may only include information such as the address, category, brand, and the like that the user is interested in, such as "Jing'an Temple Sea Fishing”; it may also include the address, class Additional descriptions of the type, target, etc. Such as "the delicious hot pot near Zhongshan Park".
  • the parsing the demand information in operation 1011 may include at least one of: segmenting the demand information; performing part-of-speech analysis on the demand information; performing named entity identification on the demand information; and Information is semantically normalized.
  • one or more points of interest that match the article description object and/or condition information may be mined by way of comparison with the thesaurus.
  • the user-originated content corresponding to the point of interest can be directly determined as the content source.
  • “Fanjia Liuzhou Snail Powder” can be determined as a hot merchant by matching with the thesaurus, so that it can be directly User comments for the hotspot merchant are determined as content sources for subsequent content mining.
  • the method further includes: acquiring a user score of each of the plurality of points of interest; screening out the plurality of points of interest according to a user score of each point of interest to meet a specific screening condition Target interest points. And determining the user-originated content corresponding to the point of interest as a content source, comprising: determining user-originated content corresponding to the target point of interest as a content source.
  • the user needs “2017 hot art film” as an example to mine three points of interest including “Fanghua”, “Gangrenboqi” and “Seventy-seven days”, and the user ratings are 8.6. 7.5 and 7.3; further, the "Fanghua" which satisfies the "highest score” can be selected as the target interest point from the three points of interest, thereby determining the user-originated content corresponding to "Fanghua" as the content source.
  • extracting at least one topic dimension vector from the content source by using a specific topic generation model including: generating, by the specific topic generation model, extracting at least one topic description word from user original content included in the content source Converting the extracted topic descriptors into at least one topic dimension vector by a first word vector transformation model.
  • a user-originated content (such as an article) included in the content source may be generated by a specific topic generation model combined by a word frequency (TF, Term Frequency) and a three-layer Bayesian probability model (LDA, Latent Dirichlet Allocation).
  • TF word frequency
  • LDA Latent Dirichlet Allocation
  • At least one topic description word is extracted from the subject and/or corpus. It should be understood by those skilled in the art that the technical implementation of the topic descriptors can be found based on the combination of TF and LDA.
  • the general including but not limited to price, environment, service, etc., can be extracted.
  • the topic description word for the topic dimension vector can be generated by a specific topic generation model combined by a word frequency (TF, Term Frequency) and a three-layer Bayesian probability model (LDA, Latent Dirichlet Allocation).
  • TF word frequency
  • LDA Latent Dirichlet Allocation
  • the extracted topic description word may be converted into at least one topic dimension vector by the word2vec model.
  • the at least one topic dimension vector includes at least two types, a general topic dimension vector and a featured topic dimension vector.
  • a coffee shop may be excavated with themes such as “service”, “taste”, “location”, “cat”, “buffy”, “environment”, etc., service, price, and environment are common thematic dimensions, and “ “Cat” and “Buffy” are the main themes of this coffee shop (because there are pets in the coffee shop and the dessert Buffy is a popular dessert).
  • Another example is “Powder Liuzhou Snail Powder”.
  • the general theme dimension of the merchant is also service, taste, environment, etc. However, its characteristic theme dimension can be excavated from “snail powder”, “fat intestine” and “corn”. Sugar water, etc.
  • performing topic sentence mining on the content source according to the topic dimension vector to obtain a topic sentence corresponding to the topic dimension vector may include: segmenting user original content in the content source and/or Or filtering operation to obtain one or more candidate topic sentences corresponding to the topic dimension vector; calculating a cosine similarity between the sentence vector of each candidate topic sentence and the corresponding topic dimension vector; calculating the calculated cosine similarity
  • the candidate topic sentence that meets the threshold is used as the topic sentence of the corresponding topic dimension vector.
  • calculating a cosine similarity between a sentence vector of each candidate topic sentence and a corresponding topic dimension vector includes: performing word segmentation processing on the candidate topic sentence, and obtaining a word segmentation corresponding to the candidate topic sentence The word segmentation result; the word segmentation result is converted into a sentence vector by a second word vector conversion model; and the cosine similarity between the converted sentence vector and the corresponding topic dimension vector is calculated.
  • the candidate topic sentence is segmented according to the user-originated content segmentation and filtering operation; the word2vec model is further called to convert the word segmentation result into a sentence vector; and then the sentence vector and the topic dimension vector are calculated.
  • the cosine similarity is obtained to obtain the topic sentence corresponding to the topic dimension vector, thereby realizing the pairing of the topic dimension vector and the topic sentence.
  • the second word vector conversion model herein may be the same word2vec model as the above-mentioned first word vector conversion model, and belong to the opposite conversion in the word2vec model. process.
  • the topic sentences corresponding to each of the topic dimension vectors obtained by operations 102-103 may include positive positive content information associated with user needs, and may also include negative information associated with user needs. From the perspective of user requirements, in order to achieve a positive publicity effect, the content that the user is more expecting to display should be positive. Therefore, before the splicing and compositing of the topic sentences corresponding to the topic dimension vector, it is necessary to further include the following Operation: Filter the topic sentences corresponding to each topic dimension vector according to the topic core word and/or the specific sentiment analysis algorithm.
  • the topic sentence corresponding to each topic dimension vector is filtered, including: for the topic sentence corresponding to each topic dimension vector, the topic sentence Performing dependency parsing analysis to obtain a topic core word of the topic sentence; determining whether the topic core word of the topic sentence belongs to a topic dimension vector corresponding to the topic sentence; if yes, retaining the topic sentence; otherwise, filtering
  • the core word of a sentence is “flavored shrimp”, which obviously does not belong to the theme dimension vector “taste” corresponding to the sentence. It only refers to the word “taste”, so it can be filtered out by using the dependency syntax analysis.
  • the topic sentences corresponding to each topic dimension vector are filtered according to a specific sentiment analysis algorithm, including: using a specific sentiment analysis for any topic sentence corresponding to each topic dimension vector.
  • the algorithm determines an sentiment orientation score of the topic sentence; determines whether the emotion of the topic sentence is positive according to the sentiment orientation of the topic sentence; if yes, retains the topic sentence; otherwise, filters the topic sentence.
  • the traditional sentiment analysis determining whether the sentence contains obvious negative words
  • the deep learning sentiment analysis for example, the BiLSTM algorithm, combined with the word order to judge the emotional tendency
  • the method before the splicing and synthesizing the topic sentence corresponding to the topic dimension vector, the method further includes: determining an ideological relevance between the topic sentence and the corresponding topic dimension vector. And selecting a core topic sentence from the topic sentences corresponding to each topic dimension vector according to the degree of thought relevance between the topic sentence and the corresponding topic dimension vector, to perform splicing and compositing the core topic sentence.
  • the degree of thought association between the topic sentence and the corresponding topic dimension vector may be determined by RNN and/or TextRank.
  • the MMR is used to de-weight the obtained topic sentences corresponding to each topic dimension vector, that is, the MMR method is used to perform similarity judgment on the candidate topic sentences in each topic dimension, and try to obtain similarity in each topic dimension.
  • topic sentences corresponding to the topic dimension vector are spliced and combined to generate an article that meets the requirement information.
  • the general theme dimension of the merchant “flavor” corresponds to the article content: “The taste is still very good, immediately fried yuba, capers, sour bamboo shoots are Very good, dry fishing more flavor, photo snail powder is very delicious, soup base is rich, slightly spicy, you can add some sour chili juice, more sour and spicy feeling, ingredients are very rich, fried large intestine, outside Crunchy, crisp inside, people who lose weight are a little bit, the heat is too much!”.
  • the generated article also includes the article content corresponding to the general theme dimension “environment” of the merchant, and the “snail powder” and “fat intestine” under the theme feature dimension of the merchant.
  • the content of the article corresponding to "corn sugar water”.
  • the article content of each topic dimension vector is no longer elaborated here.
  • the method further includes: selecting picture information associated with the at least one topic dimension vector from a specific picture library; and fusing the picture information with the generated article to form a graphic Mixed article.
  • the picture information associated therewith may be selected from the picture knowledge base. For example, continue to use “Panjia Liuzhou snail powder” as an example, as shown in Figure 3, the generated article includes the merchant's common theme dimensions “taste” and “environment” and the merchant's feature theme dimension "signature recommendation dish” Corresponding pictures of “snail powder”, “fat intestine” and “corn sugar water”.
  • the method further comprises: finding a specific object attribute and/or sentence corpus that matches the content of the generated article or the demand information; utilizing the found specific object attribute and/or sentence corpus , generate article titles with specific template rules.
  • the brand of hot pot category can be based on the specific template rules to piece together the title of the article "The hot pot of the world! Do not look?"
  • Figure 3 based on the specific template rules of merchants that include food categories and have a clear brand name, “Small and delicious private collection! You can have! XXXX ", automatically generate the title of the article as shown in "Small and delicious private collection! You can have! Powder home Liuzhou snail powder”.
  • FIG. 4 is a schematic diagram showing the structure of an article generating apparatus according to an embodiment of the present invention.
  • the article generating device 40 includes: a content source mining device 401, configured to mine a content source based on demand information input by a user; and a topic dimension vector extracting device 402, configured to generate a model from the mined content by using a specific theme. At least one topic dimension vector is extracted from the source; the topic sentence mining device 403 is configured to perform topic sentence mining on the content source according to the topic dimension vector for each of the extracted topic dimension vectors, and obtain corresponding to the
  • the topic sentence splicing generating means 404 is configured to perform splicing and compositing the topic sentences corresponding to the at least one topic dimension vector to generate an article that meets the requirement information.
  • the content source mining apparatus 401 includes: a parsing module 4011, configured to parse the demand information to obtain an article description object and/or condition information; and an mining module 4012, configured to: A point of interest matching the article description object and/or condition information is mined; a determination module 4013 is configured to determine user-originated content corresponding to the point of interest as a content source.
  • the parsing module 4011 parses the demand information, including at least one of: performing word segmentation on the demand information; performing part-of-speech analysis on the demand information; performing named entity identification on the demand information And semantically normalizing the demand information.
  • the number of points of interest that match the article description object and/or condition information may be multiple.
  • the determining module 4013 is further configured to determine user-originated content corresponding to the point of interest as a content source, including: acquiring multiples that match the article description object and/or condition information. a user score of each of the points of interest in the point of interest; selecting a target point of interest that satisfies a specific screening condition from the plurality of points of interest according to a user score of each of the points of interest; corresponding to the target point of interest User-generated content is determined as the content source.
  • the topic dimension vector extraction device 402 includes: an extraction module 4021, configured to generate a model from the user-originated content included in the content source by using the specific theme, Extracting at least one topic description word; the conversion module 4022 is configured to convert the extracted topic description word into at least one topic dimension vector by using the first word vector conversion model.
  • the topic sentence mining device 403 includes: a clause processing module 4031, configured to perform a clause and/or a filtering operation on user original content in the content source. One or more candidate topic sentences corresponding to the topic dimension vector; a calculation module 4032, configured to calculate a cosine similarity between the sentence vector of each candidate topic sentence and the topic dimension vector; a determining module 4033, configured to The calculated cosine similarity meets a specific threshold candidate topic sentence, and the topic sentence as the corresponding topic dimension vector is determined.
  • the calculating module 4032 is further configured to perform word segmentation processing on the candidate topic sentence to obtain a word segmentation result corresponding to the candidate topic sentence; and convert the word segmentation result into a second word vector conversion model.
  • Sentence vector calculates the cosine similarity between the transformed sentence vector and the subject dimension vector.
  • the device further includes a topic sentence screening device 405.
  • the topic sentence screening device 405 is configured to perform splicing and synthesizing topic sentences corresponding to the at least one topic dimension vector by using the topic sentence splicing generating device 404, including: according to a topic core word and/or a specific sentiment analysis algorithm Filtering topic sentences corresponding to each topic dimension vector; splicing and compositing the topic sentences retained by the screening.
  • the topic sentence screening device 405 filters the topic sentence corresponding to the topic dimension vector according to the topic core word, including: for the topic sentence corresponding to the topic dimension vector, the topic sentence Performing dependency parsing analysis to obtain a topic core word of the topic sentence; determining whether the topic core word of the topic sentence belongs to a topic dimension vector corresponding to the topic sentence; if yes, retaining the topic sentence; otherwise, filtering The topic sentence.
  • the topic sentence screening device 405 filters the topic sentences corresponding to each topic dimension vector according to the sentiment analysis algorithm, including: using a specific topic sentence corresponding to each topic dimension vector, using a specific The sentiment analysis algorithm determines an sentiment orientation score of the topic sentence; determines whether the emotion of the topic sentence is positive according to the sentiment orientation of the topic sentence; if yes, retains the topic sentence; otherwise, filters the topic sentence .
  • the topic sentence splicing generating device 404 is further configured to determine, between the topic sentence and the corresponding topic dimension vector, before splicing and compositing the topic sentence corresponding to each topic dimension vector.
  • the degree of thought relevance; according to the degree of ideological relevance between the topic sentence and the corresponding topic dimension vector, the core topic sentence is selected from the topic sentences corresponding to each topic dimension vector to splicing and synthesizing the core topic sentence .
  • the topic sentence splicing generating device 404 is further configured to determine, by using a cyclic neural network and a keyword extraction algorithm, a degree of thought relevance between the topic sentence and a corresponding topic dimension vector.
  • the topic sentence splicing generating apparatus 404 is further configured to: before the splicing and compositing the topic sentences of each of the topic dimension vectors, use the matching mechanism to obtain the corresponding vector corresponding to each topic dimension The topic sentence is de-reprocessed.
  • the device further includes a text fusion device 406.
  • the graphic fusion device 406 is configured to select picture information associated with the at least one topic dimension vector from a specific picture library; fuse the picture information with the generated article to form an image-mixed article .
  • the apparatus further includes an article title generating means 407.
  • the article title generating means 407 is configured to search for a specific object attribute and/or sentence corpus that matches the content of the generated article or the demand information; by using the found specific object attribute and/or sentence corpus, by specific Template rules generate article titles.
  • the article generating device of the present invention may include at least one or more processors, and at least one memory.
  • the memory stores a program that, when executed by the processor, causes the processor to perform various steps described in this specification, for example, the processor can perform the operations as shown in FIG. 101.
  • Mining a content source based on the user-entered demand information operation 102, extracting at least one topic dimension vector from the mined content source by using a specific topic generation model; operation 103, for each of the extracted topic dimension vectors And performing topic sentence mining according to the content dimension vector to obtain a topic sentence corresponding to the topic dimension vector; and operation 104, splicing and synthesizing the topic sentence corresponding to the at least one topic dimension vector to generate a matching An article describing demand information.
  • FIG. 5 shows a schematic diagram of an article generating device in accordance with an embodiment of the present invention.
  • FIG. 5 An article generating apparatus according to such an embodiment of the present invention will be described below with reference to FIG.
  • the apparatus 500 shown in FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • device 500 is embodied in the form of a general purpose computing device, including but not limited to: at least one processor 510, at least one memory 520, and a bus 560 connecting different system components (including memory 520 and processor 510). .
  • Bus 560 includes an address bus, a control bus, and a data bus.
  • Memory 520 can include volatile memory, such as random access memory (RAM) 521 and/or cache memory 522, and can further include read only memory (ROM) 523.
  • RAM random access memory
  • ROM read only memory
  • the memory 520 can also include a set (at least one) of the program modules 524, including but not limited to: an operating system, one or more applications, other program modules, and program data, each of these examples. This combination may include the implementation of a network environment.
  • Device 500 can also be in communication with one or more external devices 50 (eg, a keyboard, pointing device, Bluetooth device, etc.). This communication can be performed via an input/output (I/O) interface 540 and displayed on display unit 530. Also, device 500 can communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 550. As shown, network adapter 550 communicates with other modules in device 500 via bus 560. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in connection with device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives. And data backup storage systems, etc.
  • aspects of the invention may also be embodied in the form of a computer readable storage medium comprising program code for use when executed by a processor Having the processor perform the various steps of the method described above, for example, the processor can perform operation 101 as shown in FIG.
  • operation 102 mining a content source based on user-entered demand information
  • operation 102 generating a model using a particular topic Extracting at least one topic dimension vector from the mined content source
  • operation 103 for each of the extracted topic dimension vectors, performing topic sentence mining according to the topic dimension vector, and obtaining a corresponding topic a topic sentence of the dimension vector
  • operation 104 splicing and synthesizing the topic sentence corresponding to the at least one topic dimension vector to generate an article that conforms to the requirement information.
  • the computer readable storage medium can employ any combination of one or more readable mediums.
  • the readable medium can be a readable signal medium or a readable storage medium.
  • the readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • FIG. 6 shows a schematic diagram of a computer readable storage medium based on artificial intelligence based article generation in accordance with an embodiment of the present invention.
  • a computer readable storage medium 3 may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be in a terminal device such as a personal computer. Run on.
  • CD-ROM portable compact disk read only memory
  • the computer readable storage medium of the present invention is not limited thereto, and in the present document, the readable storage medium may be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, apparatus or device. .
  • Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language.
  • the program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on.
  • the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (eg, utilizing an Internet service) The provider is connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet an external computing device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

一种文章生成方法、设备及计算机存储介质。根据该方法,基于用户输入的需求信息挖掘出内容源(101),利用特定主题生成模型从挖掘出的所述内容源中提取至少一个主题维度向量(102),针对所提取的每个所述主题纬度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句(103),对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,可生成符合所述需求信息的文章(104)。

Description

文章生成
相关申请的交叉引用
本专利申请要求于2018年5月15日提交的、申请号为201810462391.7、发明名称为“文章声称方法、设备及存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本发明涉及人工智能技术,尤其涉及一种文章生成方法、设备及计算机存储介质。
背景技术
随着用户消费的不断升级,低价格不再是决定用户消费的主要因素,用户更渴望获取多元化信息来支持消费决策。于是,包括有多元化信息的头条文章逐步成为各大电商和内容平台炙手可热的需求。目前的文章发布平台主要通过以下三种方式获取文章:1.人工撰写;2.外源爬取文章头条;3.通过模板拼接的形式生成。
另外,中国专利公开号CN106874248A公开了一种基于人工智能的文章生成方法和装置,该方法包括:通过根据文章语料,预先建立模板库之后,选取模板库中的目标基础框架;进而在预设数据库中,根据目标基础框架中各段落所描述的对象和所涉及的字段进行搜索,得到对象对应的各字段内容,并在目标基础框架中的每一段落位置,分别对应填充搜索到的各字段内容,得到文章主体;最后根据在标题库中匹配得到的文章主体的目标标题,以及该文章主体,拼接生成文章。由于基础框架用于指示对应类别的文章中,各段落所描述的对象以及所述段落所涉及的字段,从而根据预设数据库中的字段内容对基础框架进行填充后能够自动生成文章。
发明内容
本发明的目的之一是基于用户需求来实现文章的自动撰写。
根据本发明第一方面,提供一种文章生成方法,包括:基于用户输入的需求信息挖掘内容源;利用特定主题生成模型,从挖掘出的所述内容源中提取至少一个主题维度向量;针对所提取的每一个所述主题维度向量,依据所述主题维度向量对所述内容源进行 主题句挖掘,得到对应于主题维度向量的主题句;以及对对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,生成符合所述需求信息的文章。
根据本发明的一个实施方式,其中,所述基于用户输入的所述需求信息挖掘所述内容源,包括:解析所述需求信息,得到文章描述对象和/或条件信息;挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;将对应于所述兴趣点的用户原创内容确定为内容源。
根据本发明的一个实施方式,其中,解析所述需求内容包括以下操作至少之一:对所述需求信息进行分词;对所述需求信息进行词性分析;对所述需求信息进行命名实体识别;以及对所述需求信息进行语义归一化处理。
根据本发明的一个实施方式,其中,将对应于兴趣点(POI,Point of Interest)的用户原创内容确定为内容,包括:获取与所述文章描述对象和/或条件信息相匹配的多个兴趣点中每个所述兴趣点的用户评分;根据每个兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点;将对应于所述兴趣点的用户原创内容(UGC,User Generated Content)确定为内容源。
根据本发明的一个实施方式,其中,利用特定主题生成模型,从所述内容源中提取至少一个主题维度向量,包括:通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中提取至少一个主题描述词;通过第一词向量转换模型,将所提取的所述主题描述词转换为所述至少一个主题维度向量。
根据本发明的一个实施方式,其中,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句,包括:对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;计算每一个所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度;将所计算出的所述余弦相似度符合阈值的所述候选主题句,确定作为对应所述主题维度向量的主题句。
根据本发明的一个实施方式,其中,计算所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度,包括:针对所述候选主题句进行分词处理,得到对所述候选主题句的分词结果;通过第二词向量转换模型将所述分词结果转换成句子向量;计算所转换成的所述句子向量和对应所述主题维度向量之间的余弦相似度。
根据本发明的一个实施方式,其中,对对应于所述至少一个主题维度向量的主题句 进行拼接合成,包括:根据主题核心词和/或特定情感分析算法,对对应于每一个所述主题维度向量的主题句进行筛选;对通过所述筛选被保留的所述主题句进行拼接合成。
根据本发明的一个实施方式,其中,根据主题核心词,对对应于所述主题维度向量的主题句进行筛选,包括:针对对应于所述主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;判断所述主题句的主题核心词是否属于所述主题句所对应的主题维度向量;如果是,则保留所述主题句;否则,过滤所述主题句。
根据本发明的一个实施方式,其中,根据特定情感分析算法,对对应于所述主题维度向量的主题句进行筛选,包括:针对对应于所述主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;根据所述主题句的情感倾向分,判断所述主题句的情感是否呈正向;如果是,则保留所述主题句;否则,过滤所述主题句。
根据本发明的一个实施方式,其中,对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,包括:确定所述主题句与对应的所述主题维度向量之间的思想关联度;根据所述主题句与对应的所述主题维度向量之间的思想关联度,从对应于每一个所述主题维度向量的主题句中挑选出核心主题句,以对所述核心主题句进行拼接合成。
根据本发明的一个实施方式,其中,确定所述主题句与对应的所述主题维度向量之间的思想关联度,包括:通过循环神经网络(RNN,Recurrent Neural Networks)和/或关键词提取算法(TextRank),确定所述主题句与对应的主题维度向量之间的思想关联度。
根据本发明的一个实施方式,其中,在对所述每一个主题维度向量的主题句进行拼接合成之前,所述方法进一步包括:采用匹配机制(MMR,Match Making Rating)对所得到的对应于每一个所述主题维度向量的所述主题句进行去重处理。
根据本发明的一个实施方式,其中,所述方法进一步包括:从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;将所述图片信息与所生成的所述文章进行融合,以形成图片混合的文章。
根据本发明的一个实施方式,其中,所述方法进一步包括:查找与所生成的所述文章的内容或所述需求信息相匹配的特定对象定语和/或句子语料;利用查找到的所述特定对象定语和/或句子语料,通过特定模板规则生成文章标题。
根据本发明第二方面,提供一种文章生成设备,包括:内容源挖掘装置,用于基于 用户输入的需求信息挖掘内容源;主题维度向量提取装置,用于利用特定主题生成模型,从挖掘出的所述内容源中提取至少一个主题维度向量;主题句挖掘装置,用于针对所提取的每一个所述主题维度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句;主题句拼接生成装置,用于对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,生成符合所述需求信息的文章。
根据本发明的一个实施方式,其中,所述内容源挖掘装置包括:解析模块,用于解析所述需求信息,得到文章描述对象和/或条件信息;挖掘模块,用于挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;确定模块,用于将对应于所述兴趣点的用户原创内容确定为内容源。
根据本发明的一个实施方式,其中,所述解析模块解析所述需求信息包括以下操作至少之一:对所述需求信息进行分词;对所述需求信息进行词性分析;对所述需求信息进行命名实体识别;以及对所述需求信息进行语义归一化处理。
根据本发明的一个实施方式,其中,所述确定模块还用于,在通过所述挖掘模块挖掘与所述文章描述对象和/或条件信息相匹配的多个兴趣点之后,获取与所述多个兴趣点中每个兴趣点的用户评分;根据每个所述兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点;将对应于所述目标兴趣点的用户原创内容确定为内容源。
根据本发明的一个实施方式,其中,所述主题维度向量提取装置包括:提取模块,用于通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中,提取至少一个主题描述词;转换模块,用于通过第一词向量转换模型将所提取的所述主题描述词转换为所述至少一个主题维度向量。
根据本发明的一个实施方式,其中,所述主题句挖掘装置包括:分句处理模块,用于对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;计算模块,用于计算每一个所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度;判断模块,用于将所计算出的所述余弦相似度符合特定阈值的所述候选主题句,确定作为对应主题维度向量的主题句。
根据本发明的一个实施方式,其中,所述计算模块还用于,针对所述候选主题句进行分词处理,得到对应于所述候选主题句的分词结果;通过第二词向量转换模型将所述分词结果转换成句子向量;计算所转换成的所述句子向量和所述主题维度向量之间的余 弦相似度。
根据本发明的一个实施方式,其中,所述设备还包括主题句筛选装置;所述主题句筛选装置,用于在通过所述主题句拼接生成装置对对应于主题维度向量的主题句进行拼接合成之前,根据主题核心词和/或特定情感分析算法,对对应于每一个所述主题维度向量的所述主题句进行筛选。
根据本发明的一个实施方式,其中,所述主题句筛选装置被配置为:针对对应于所述主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;判断所述主题句的所述主题核心词是否属于所述主题句所对应的所述主题维度向量;如果是,则保留所述主题句;否则,过滤所述主题句;
根据本发明的一个实施方式,其中,所述主题句筛选装置被配置为:针对对应于所述主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;根据所述主题句的情感倾向分判断所述主题句的情感是否呈正向;如果是,则保留所述主题句;否则,过滤所述主题句。
根据本发明的一个实施方式,其中,所述主题句拼接生成装置还用于,在对对应于每一个所述主题维度向量的主题句进行拼接合成之前,确定所述主题句与对应的所述主题维度向量之间的思想关联度;根据所述主题句与对应的所述主题维度向量之间的思想关联度,从对应于每一个所述主题维度向量的主题句中挑选出核心主题句,以对所述核心主题句进行拼接合成。
根据本发明的一个实施方式,其中,所述主题句拼接生成装置还用于,通过循环神经网络RNN和关键词提取算法TextRank,确定所述主题句与对应的所述主题维度向量之间的思想关联度。
根据本发明的一个实施方式,其中,所述主题句拼接生成装置还用于,在对所述每一个所述主题维度向量的所述主题句进行拼接合成之前,采用MMR方法来对所得到的对应于每一个所述主题维度向量的所述主题句进行去重处理。
根据本发明的一个实施方式,其中,所述设备还包括图文融合装置;所述图文融合装置,用于从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;将所述图片信息与所生成的所述文章进行融合,以形成图文混合的文章。
根据本发明的一个实施方式,其中,所述设备还包括文章标题生成装置;所述文章标题生成装置,用于查找与所生成的所述文章的内容或所述需求信息相匹配的特定对象 定语和/或句子语料;利用查找到的特定对象定语和/或句子语料,通过特定模板规则生成文章标题。
根据本发明第三方面,提供一种文章生成设备,包括:一个或者多个处理器;存储器;存储在所述存储器中的程序,当被所述一个或者多个处理器执行时,所述程序使所述处理器执行如上所述的方法。
根据本发明第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储有程序,当所述程序被处理器执行时,使得所述处理器执行如上所述的方法。
通过基于用户输入的需求信息挖掘内容源,进一步从挖掘出的内容源中获取与用户需求关联性强的文章内容,并生成符合所述需求信息的文章,本公开实施例有效实现了基于用户需求的文章自动撰写,且文章生成周期短、成本低、质量优。
需要理解的是,本发明的教导并不需要实现上面所述的全部有益效果,而是特定的技术方案可以实现特定的技术效果,并且本发明的其他实施方式还能够实现上面未提到的有益效果。
附图说明
通过参考附图阅读下文的详细描述,本发明示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本发明的若干实施方式,其中:
在附图中,相同或对应的标号表示相同或对应的部分。
图1示出了本发明实施例的文章生成方法的流程图;
图2示出了本发明实施例基于用户输入的需求信息挖掘内容源的实现流程图;
图3示出了本发明一应用示例以“粉家柳州螺蛳粉”为例的基于人工智能所生成的文章的显示效果图;
图4示出了本发明实施例的文章生成设备的组成结构示意图;
图5示出了根据本发明实施方式的文章生成设备的示意图;
图6示出了根据本发明实施方式的基于人工智能的文章生成的计算机可读存储介质的示意图。
具体实施方式
下面将参考若干示例性实施方式来描述本发明的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本发明,而并非以任何方式限制本发明的范围。相反,提供这些实施方式是为了使本发明更加透彻和完整,并且能够将本发明的范围完整地传达给本领域的技术人员。
下面结合附图对本发明的具体实施方式进行详细描述。
图1示出了本发明实施例的文章生成方法的流程图。如图1所示,该方法包括:
操作101,基于用户输入的需求信息挖掘内容源。
操作102,利用特定主题生成模型从挖掘出的所述内容源中提取至少一个主题维度向量。
操作103,针对所提取的每一个主题维度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句。
操作104,对对应于所述至少一个主题维度向量的主题句进行拼接合成,生成符合所述需求信息的文章。
这里,在生成满足用户需求的文章的过程中,用户仅需将需求信息输入至终端,以使终端响应于用户输入的需求信息来自动化执行操作101~104的步骤即可。
其中,所述终端可以是个人计算机(PC,Personal Computer)这种固定的电子设备,还可以为如个人数字助理(PAD,Personal Assistant Digital)、手提电脑、平板电脑这种便携式的电子设备,当然还可以为如智能手机这种智能移动终端。
下面首先针对终端响应于用户输入的需求信息后执行的操作101进行详细描述。
图2示出了本发明实施例基于用户输入的需求信息挖掘内容源的实现流程图。如图2所示,在本发明一种可能的实现方式中,所述基于用户输入的需求信息挖掘内容源包括:操作1011,解析所述需求信息,得到文章描述对象和/或条件信息;操作1012,挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;操作1013,将对应于所述兴趣点的用户原创内容确定为内容源。
需要理解的是,所述需求信息为用户输入的任意形式的信息。举例来是,所述需求信息可以包括品牌名、商户名、推荐菜及地理位置等类型的不同组合形式的信息。当然, 在具体应用中,用户通常输入至终端的需求信息可能仅包括用户所感兴趣的地址、类目、品牌等类型的信息,如“静安寺海底捞”;还可能包括用来对地址、类目、品牌等类型进行附加描述的信息。如“中山公园附近好吃的火锅”。
其中,在操作1011解析所述需求信息可以包括以下操作至少之一:对所述需求信息进行分词;对所述需求信息进行词性分析;对所述需求信息进行命名实体识别;以及对所述需求信息进行语义归一化处理。
本领域技术人员可以理解的是,基于用户输入的不同形式的信息,在对需求信息进行解析的过程中会选择性的执行上述所罗列几种操作的部分或全部。另外,在解析得到文章描述对象和/或条件信息的过程中,必然会与线下挖掘存储的包含有大数据量的词库进行信息匹配,以发现诸如品牌名、商户名、推荐菜、地理位置等关键信息。举例来说,对于用户输入的“中山公园附近好吃的火锅”,则通过解析可以得到“火锅”这一类目词的文章描述对象,同时还可以得到“中山公园”这一地理位置的条件信息。
在操作1012-1013的具体实现过程中,由于通过与词库进行比对的方式,可能挖掘到与所述文章描述对象和/或条件信息相匹配的一个或多个兴趣点。针对挖掘到一个兴趣点的情况,可以直接将对应于所述兴趣点的用户原创内容确定为内容源,比如“粉家柳州螺蛳粉”通过与词库匹配可确定为一家热点商户,于是可以直接将对该热点商户的用户评论确定为内容源,以进行后续的内容挖掘。对于挖掘到多个兴趣点的情况,可以首先对多个兴趣点进行择优筛选,以为后续生成关联度高的文章奠定基础。
根据本发明一种可能的实现方式,在挖掘出与所述文章描述对象和/或条件信息相匹配的多个兴趣点之后,并且在将内容信息库中对应于所述兴趣点的用户原创内容确定为内容源之前,所述方法进一步包括:获取所述多个兴趣点中每个兴趣点的用户评分;根据每个兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点。并且,将对应于所述兴趣点的用户原创内容确定为内容源,包括:将对应于所述目标兴趣点的用户原创内容确定为内容源。
在一示例中,以用户需求“2017热点文艺片”为例,挖掘得到包括《芳华》、《冈仁波齐》和《七十七天》的三个兴趣点,且用户评分分别为8.6、7.5和7.3;进一步可以从所述三个兴趣点中筛选出满足“评分最高”的《芳华》作为目标兴趣点,从而将对应于《芳华》的用户原创内容确定为内容源。
上文已对操作101的基于用户输入的需求信息挖掘内容源的实现过程进行了详细描 述,为后续文章生成提高了优质的内容源。下面将通过对操作102~104的具体实现来详细描述如何从内容源获取与用户需求关联性强的文章内容。
在操作102,利用特定主题生成模型从所述内容源中提取至少一个主题维度向量,包括:通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中提取至少一个主题描述词;通过第一词向量转换模型将所提取的主题描述词转换为至少一个主题维度向量。
具体地,可以通过词频(TF,Term Frequency)和三层贝叶斯概率模型(LDA,Latent Dirichlet Allocation)相结合的特定主题生成模型,来从所述内容源所包括的用户原创内容(如文章主题和/或语料)中提取至少一个主题描述词。本领域技术人员应该理解的是,基于TF和LDA相结合的方式来发现主题描述词这一技术实现,通过调用人工构建通用主题知识库,可以提取得到包括但不限于价格、环境、服务等通用主题维度向量的主题描述词。
进一步地,在提取得到至少一个主题描述词之后,可通过word2vec模型将所提取的主题描述词转换为至少一个主题维度向量。其中,所述至少一个主题维度向量至少包括如下两种,通用主题维度向量和特色主题维度向量。比如,一家咖啡店可能被挖掘出“服务”、“味道”、“位置”、“猫咪”、“芭菲”、“环境”等主题维度,服务、价格、环境为通用的主题维度,而“猫咪”、“芭菲”是这家咖啡店的特色主题维度(因为咖啡店里有宠物且甜品芭菲是热门甜品)。再比如“粉家柳州螺蛳粉”,如图3所示,该商户的通用主题维度也是服务、口味、环境等,然而其特色主题维度可被挖掘出“螺蛳粉”、“肥肠”、“玉米糖水”等。
在操作103,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句,可以包括:对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;计算每一个候选主题句的句子向量与对应主题维度向量之间的余弦相似度;将所计算的余弦相似度符合阈值的候选主题句作为对应主题维度向量的主题句。
根据本发明一种可能的实现方式,计算每一个候选主题句的句子向量与对应主题维度向量之间的余弦相似度,包括:针对候选主题句进行分词处理,得到对应于所述候选主题句的分词结果;通过第二词向量转换模型将所述分词结果转换成句子向量;计算所转换成的句子向量和对应主题维度向量之间的余弦相似度。
具体地,首先在对用户原创内容进行分句和过滤操作的基础上对候选主题句进行分词处理;进一步调用word2vec模型将分词处理结果转换成句子向量;再计算句子向量和主题维度向量之间的余弦相似度,以得到对应主题维度向量的主题句,从而实现主题维度向量和主题句的配对。本领域技术人员应该理解的是,此处的所述第二词向量转换模型与上文所提及的所述第一词向量转换模型可同为word2vec模型,属于word2vec模型中互为相反的转换过程。
需要理解的是,通过操作102-103所得到的对应于每一个主题维度向量的主题句可能包括与用户需求相关联的正向积极的内容信息,也可能包括与用户需求相关联的负面信息。从用户需求的角度分析,为了实现积极的宣传效果,用户更加期待展示的文章内容应该是正向积极的,因此在对所述对应于主题维度向量的主题句进行拼接合成之前,还需要进一步包括如下操作:根据主题核心词和/或特定情感分析算法,对对应于每一个主题维度向量的主题句进行筛选。
根据本发明一种可能的实现方式,根据主题核心词,对对应于每一个主题维度向量的主题句进行筛选,包括:针对对应于每一个主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;判断所述主题句的主题核心词是否属于所述主题句所对应的主题维度向量;如果是,则保留所述主题句;否则,过滤所述主题句。比如,一个句子的主题核心词为“口味虾”,这显然不属于该句子对应的主题维度向量“口味”,仅仅是提到了“口味”二字,故利用依存句法分析可将其过滤掉。
根据本发明一种可能的实现方式,根据特定情感分析算法,对对应于每一个主题维度向量的主题句进行筛选,包括:针对对应于每一个主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;根据所述主题句的情感倾向分判断所述主题句的情感是否呈正向;如果是,则保留所述主题句;否则,过滤所述主题句。具体地,可以利用传统情感分析(判断句子是否包含明显负面词)和深度学习情感分析(例如,BiLSTM算法,结合词序判断情感倾向)结合的方法得到主题句的情感倾向分之后,直接保留情感呈正向的主题句。
根据本发明一种可能的实现方式,在对所述对应于主题维度向量的主题句进行拼接合成之前,所述方法进一步包括:确定所述主题句与对应的主题维度向量之间的思想关联度;根据所述主题句与对应的主题维度向量之间的思想关联度,从对应于每一个主题维度向量的主题句中挑选出核心主题句,以对所述核心主题句进行拼接合成。具体地, 可以通过RNN和/或TextRank,确定所述主题句与对应的主题维度向量之间的思想关联度。
进一步地,采用MMR来对所得到的对应于每一个主题维度向量的主题句进行去重处理,即利用MMR方法对各主题维度下的候选主题句进行相似度判罚,尽量得到各主题维度下相似度小、覆盖信息大的主题句。
最后,对所述对应于主题维度向量的主题句进行拼接合成,生成符合所述需求信息的文章。比如“粉家柳州螺蛳粉”的例子,如图3所示,该商户的通用主题维度“口味”对应的文章内容为“味道还是非常不错的,立马的炸腐竹、酸豆角、酸笋味道都很不错,干捞的更有味道一些,照片螺蛳粉很好吃,汤底浓郁、微辣,可以加些酸辣椒汁,更加酸酸辣辣的感觉,配料很足很丰富,炸大肠,外面脆脆的,里面酥酥的,减肥的人甚点哈,热量油水太足!”。另外,从图3可以看出,所生成的文章中还包括该商户的通用主题维度“环境”对应的文章内容,以及该商户特色主题维度“招牌推荐菜”下“螺蛳粉”、“肥肠”、“玉米糖水”各自所对应的文章内容。这里不再对每一个主题维度向量的文章内容进行一一具体阐述。
在实际应用中,为了增强所生成文章的可读性,提升用户的视觉感知,可以在生成文章的过程中,增加一些与用户需求相关联的图片信息,以生成包括图片信息的文章。
根据本发明一实施方式,所述方法进一步包括:从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;将所述图片信息与所生成的文章进行融合,以形成图文混合的文章。具体地,可以在操作102挖掘出至少一个主题维度向量之后,从图片知识库中挑选与其相关联的图片信息。比如继续以“粉家柳州螺蛳粉”为例,如图3所示,所生成的文章中包括该商户的通用主题维度“口味”和“环境”以及该商户特色主题维度“招牌推荐菜”下“螺蛳粉”、“肥肠”、“玉米糖水”的对应图片。
根据本发明一实施方式,所述方法进一步包括:查找与所生成的文章的内容或所述需求信息相匹配的特定对象定语和/或句子语料;利用查找到的特定对象定语和/或句子语料,通过特定模板规则生成文章标题。例如火锅类目的品牌,可基于特定模板规则拼凑出“风靡全球的火锅!不看看吗?”的文章标题。再以“粉家柳州螺蛳粉”这一商户为例,如图3所示,基于包括美食类目且含有明确品牌名的商户的特定模板规则“小编私藏的绝密美味!可以有!XXXX”,自动生成如“小编私藏的绝密美味!可以有!粉家柳州螺蛳粉”所示的文章标题。
图4示出了本发明实施例的文章生成设备的组成结构示意图。
如图4所示,该文章生成设备40包括:内容源挖掘装置401,用于基于用户输入的需求信息挖掘内容源;主题维度向量提取装置402,用于利用特定主题生成模型从挖掘出的内容源中提取至少一个主题维度向量;主题句挖掘装置403,用于针对所提取的每一个所述主题维度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句;主题句拼接生成装置404,用于对对应于所述至少一个主题维度向量的主题句进行拼接合成,生成符合所述需求信息的文章。
根据本发明一实施方式,如图4所示,所述内容源挖掘装置401包括:解析模块4011,用于解析所述需求信息,得到文章描述对象和/或条件信息;挖掘模块4012,用于挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;确定模块4013,用于将对应于所述兴趣点的用户原创内容确定为内容源。
根据本发明一实施方式,所述解析模块4011解析所述需求信息包括以下操作至少之一:对所述需求信息进行分词;对所述需求信息进行词性分析;对所述需求信息进行命名实体识别;以及对所述需求信息进行语义归一化处理。
根据本发明一实施方式,所述与所述文章描述对象和/或条件信息相匹配的兴趣点可为多个。在这种情况下,所述确定模块4013还用于,将对应于所述兴趣点的用户原创内容确定为内容源,包括:获取与所述文章描述对象和/或条件信息相匹配的多个兴趣点中每个所述兴趣点的用户评分;根据每个所述兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点;将对应于所述目标兴趣点的用户原创内容确定为所述内容源。
根据本发明一实施方式,如图4所示,所述主题维度向量提取装置402包括:提取模块4021,用于通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中,提取至少一个主题描述词;转换模块4022,用于通过第一词向量转换模型将所提取的主题描述词转换为至少一个主题维度向量。
根据本发明一实施方式,如图4所示,所述主题句挖掘装置403包括:分句处理模块4031,用于对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;计算模块4032,用于计算每一个候选主题句的句子向量与主题维度向量之间的余弦相似度;判断模块4033,用于将所计算出的余弦相似度符合特定阈值的候选主题句,确定作为对应主题维度向量的主题句。
根据本发明一实施方式,所述计算模块4032还用于,针对候选主题句进行分词处理,得到对应于所述候选主题句的分词结果;通过第二词向量转换模型将所述分词结果转换成句子向量;计算所转换成的句子向量和主题维度向量之间的余弦相似度。
根据本发明一实施方式,如图4所示,所述设备还包括主题句筛选装置405。所述主题句筛选装置405,用于在通过所述主题句拼接生成装置404对对应于所述至少一个主题维度向量的主题句进行拼接合成,包括:根据主题核心词和/或特定情感分析算法,对对应于每一个主题维度向量的主题句进行筛选;对通过所述筛选被保留的所述主题句进行拼接合成。
根据本发明一实施方式,所述主题句筛选装置405根据主题核心词,对对应于主题维度向量的主题句进行筛选,包括:针对对应于主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;判断所述主题句的主题核心词是否属于所述主题句所对应的主题维度向量;如果是,则保留所述主题句;否则,过滤所述主题句。根据本发明一实施方式,所述主题句筛选装置405根据情感分析算法,对对应于每一个主题维度向量的主题句进行筛选包括:针对对应于每一个主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;根据所述主题句的情感倾向分判断所述主题句的情感是否呈正向;如果是,则保留所述主题句;否则,过滤所述主题句。
根据本发明一实施方式,所述主题句拼接生成装置404还用于,在对对应于每一个主题维度向量的主题句进行拼接合成之前,确定所述主题句与对应的主题维度向量之间的思想关联度;根据所述主题句与对应的主题维度向量之间的思想关联度,从对应于每一个主题维度向量的主题句中挑选出核心主题句,以对所述核心主题句进行拼接合成。
根据本发明一实施方式,所述主题句拼接生成装置404还用于,通过循环神经网络和关键词提取算法,确定所述主题句与对应的主题维度向量之间的思想关联度。
根据本发明一实施方式,所述主题句拼接生成装置404还用于,在对所述每一个主题维度向量的主题句进行拼接合成之前,采用匹配机制对所得到的对应于每一个主题维度向量的主题句进行去重处理。
根据本发明一实施方式,如图4所示,所述设备还包括图文融合装置406。所述图文融合装置406,用于从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;将所述图片信息与所生成的文章进行融合,以形成图文混合的文章。
根据本发明一实施方式,如图4所示,所述设备还包括文章标题生成装置407。所述文章标题生成装置407,用于查找与所生成的文章的内容或所述需求信息相匹配的特定对象定语和/或句子语料;利用查找到的特定对象定语和/或句子语料,通过特定模板规则生成文章标题。
这里需要指出的是:以上设备实施例中的描述,与上述方法描述是类似的,同方法的有益效果描述,不作赘述。对于本发明设备实施例中未披露的技术细节,请参照本发明方法实施例的描述。
示例性设备
在介绍了本发明示例性实施方式的方法和设备之后,接下来,介绍根据本发明的另一示例性实施方式的文章生成设备。
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
在一些可能的实施方式中,本发明的文章生成设备可以至少包括一个或多个处理器、以及至少一个存储器。其中,所述存储器存储有程序,当所述程序被所述处理器执行时,使得所述处理器执行本说明书中描述各个步骤,例如,所述处理器可以执行如图1中所示的操作101,基于用户输入的需求信息挖掘内容源;操作102,利用特定主题生成模型从挖掘出的所述内容源中提取至少一个主题维度向量;操作103,针对所提取的每一个所述主题维度向量,依据所述主题维度向量所述内容源进行主题句挖掘,得到对应于主题维度向量的主题句;以及操作104,对对应于所述至少一个主题维度向量的主题句进行拼接合成,生成符合所述需求信息的文章。
图5示出了根据本发明实施方式的文章生成设备的示意图。
下面参照图5来描述根据本发明的这种实施方式的文章生成设备。图5显示的设备500仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图5所示,设备500以通用计算设备的形式表现,包括但不限于:上述至少一个处理器510、上述至少一个存储器520、连接不同系统组件(包括存储器520和处理器510)的总线560。
总线560包括地址总线,控制总线和数据总线。
存储器520可以包括易失性存储器,例如随机存取存储器(RAM)521和/或高速缓存存储器522,还可以进一步包括只读存储器(ROM)523。
存储器520还可以包括一组(至少一个)程序模块524,这样的程序模块524包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
设备500还可以与一个或多个外部设备50(例如键盘、指向设备、蓝牙设备等)通信。这种通信可以通过输入/输出(I/O)接口540进行,并在显示单元530上进行显示。并且,设备500还可以通过网络适配器550与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器550通过总线560与设备500中的其它模块通信。应当明白,尽管图中未示出,但可以结合设备500使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
示例性计算机可读存储介质
在一些可能的实施方式中,本发明的各个方面还可以实现为一种计算机可读存储介质的形式,其包括程序代码,当所述程序代码在被处理器执行时,所述程序代码用于使所述处理器执行上面描述的方法的各个步骤,例如,所述处理器可以执行如图1中所示的操作101,基于用户输入的需求信息挖掘内容源;操作102,利用特定主题生成模型从挖掘出的所述内容源中提取至少一个主题维度向量;操作103,针对所提取的每一个所述主题维度向量,依据所述主题维度向量所述内容源进行主题句挖掘,得到对应于主题维度向量的主题句;以及操作104,对对应于所述至少一个主题维度向量的主题句进行拼接合成,生成符合所述需求信息的文章。
所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
图6示出了根据本发明实施方式的基于人工智能的文章生成的计算机可读存储介 质的示意图。
如图6所示,描述了根据本发明的实施方式的计算机可读存储介质3,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的计算机可读存储介质不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
虽然已经参考若干具体实施方式描述了本发明的精神和原理,但是应该理解,本发明并不限于所公开的具体实施方式,对各方面的划分也不意味着这些方面中的特征不能组合以进行受益,这种划分仅是为了表述的方便。本发明旨在涵盖所附权利要求的精神和范围内所包括的各种修改和等同布置。

Claims (32)

  1. 一种文章生成方法,包括:
    基于用户输入的需求信息挖掘内容源;
    利用特定主题生成模型,从挖掘出的所述内容源中提取至少一个主题维度向量;
    针对所提取的每一个所述主题维度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句;以及
    对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,生成符合所述需求信息的文章。
  2. 根据权利要求1所述的方法,基于用户输入的所述需求信息挖掘所述内容源,包括:
    解析所述需求信息,得到文章描述对象和/或条件信息;
    挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;
    将对应于所述兴趣点的用户原创内容确定为所述内容源。
  3. 根据权利要求2所述的方法,解析所述需求信息包括以下操作至少之一:
    对所述需求信息进行分词;
    对所述需求信息进行词性分析;
    对所述需求信息进行命名实体识别;以及
    对所述需求信息进行语义归一化处理。
  4. 根据权利要求2所述的方法,将对应于所述兴趣点的用户原创内容确定为内容源,包括:
    获取与所述文章描述对象和/或条件信息相匹配的多个兴趣点中每个所述兴趣点的用户评分;
    根据每个所述兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点;
    将对应于所述目标兴趣点的用户原创内容确定为所述内容源。
  5. 根据权利要求1所述的方法,利用特定主题生成模型,从所述内容源中提取至少一个主题维度向量,包括:
    通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中提取至少一个主题描述词;
    通过第一词向量转换模型,将所提取的所述主题描述词转换为所述至少一个主题维度向量。
  6. 根据权利要求1所述的方法,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句,包括:
    对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;
    计算每一个所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度;
    将所计算出的所述余弦相似度符合阈值的所述候选主题句,确定作为对应所述主题维度向量的主题句。
  7. 根据权利要求6所述的方法,计算所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度,包括:
    针对所述候选主题句进行分词处理,得到对所述候选主题句的分词结果;
    通过第二词向量转换模型将所述分词结果转换成句子向量;
    计算所转换成的所述句子向量和所述主题维度向量之间的余弦相似度。
  8. 根据权利要求1所述的方法,对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,包括:
    根据主题核心词和/或特定情感分析算法,对对应于每一个所述主题维度向量的主题句进行筛选;
    对通过所述筛选被保留的所述主题句进行拼接合成。
  9. 根据权利要求8所述的方法,根据主题核心词,对对应于所述主题维度向量的主题句进行筛选,包括:
    针对对应于所述主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;
    判断所述主题句的主题核心词是否属于所述主题句所对应的主题维度向量;
    如果是,则保留所述主题句;
    否则,过滤所述主题句。
  10. 根据权利要求8所述的方法,根据特定情感分析算法,对对应于所述主题维度向量的主题句进行筛选,包括:
    针对对应于所述主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;
    根据所述主题句的情感倾向分,判断所述主题句的情感是否呈正向;
    如果是,则保留所述主题句;
    否则,过滤所述主题句。
  11. 根据权利要求1所述的方法,对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,包括:
    确定所述主题句与对应的所述主题维度向量之间的思想关联度;
    根据所述主题句与对应的所述主题维度向量之间的思想关联度,从对应于每一个所述主题维度向量的主题句中挑选出核心主题句;
    对所述核心主题句进行拼接合成。
  12. 根据权利要求11所述的方法,确定所述主题句与对应的所述主题维度向量之间的思想关联度,包括:
    通过循环神经网络和/或关键词提取算法,确定所述主题句与对应的所述主题维度向量之间的思想关联度。
  13. 根据权利要求1所述的方法,在对对应于所述至少一个主题维度向量的所述主题句进行拼接合成之前,所述方法进一步包括:
    采用匹配机制对所得到的对应于每一个所述主题维度向量的所述主题句进行去重处理。
  14. 根据权利要求13所述的方法,所述方法进一步包括:
    从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;
    将所述图片信息与所生成的所述文章进行融合,以形成图文混合的文章。
  15. 根据权利要求1至14中任一项所述的方法,所述方法进一步包括:
    查找与所生成的所述文章的内容或所述需求信息相匹配的特定对象定语和/或句子语料;
    利用查找到的所述特定对象定语和/或句子语料,通过特定模板规则生成文章标题。
  16. 一种文章生成设备,包括:
    内容源挖掘装置,用于基于用户输入的需求信息挖掘内容源;
    主题维度向量提取装置,用于利用特定主题生成模型,从挖掘出的所述内容源中提取至少一个主题维度向量;
    主题句挖掘装置,用于针对所提取的每一个所述主题维度向量,依据所述主题维度向量对所述内容源进行主题句挖掘,得到对应于所述主题维度向量的主题句;以及
    主题句拼接生成装置,用于对对应于所述至少一个主题维度向量的所述主题句进行拼接合成,生成符合所述需求信息的文章。
  17. 根据权利要求16所述的设备,所述内容源挖掘装置包括:
    解析模块,用于解析所述需求信息,得到文章描述对象和/或条件信息;
    挖掘模块,用于挖掘与所述文章描述对象和/或条件信息相匹配的兴趣点;
    确定模块,用于将对应于所述兴趣点的用户原创内容确定为所述内容源。
  18. 根据权利要求17所述的设备,所述解析模块解析所述需求信息包括以下操作至少之一:
    对所述需求信息进行分词;
    对所述需求信息进行词性分析;
    对所述需求信息进行命名实体识别;以及
    对所述需求信息进行语义归一化处理。
  19. 根据权利要求17所述的设备,所述确定模块还用于,
    在通过所述挖掘模块挖掘与所述文章描述对象和/或条件信息相匹配的多个兴趣点之后,获取与所述多个兴趣点中每个所述兴趣点的用户评分;
    根据每个所述兴趣点的用户评分从所述多个兴趣点中筛选出满足特定筛选条件的目标兴趣点;
    将对应于所述目标兴趣点的用户原创内容确定为所述内容源。
  20. 根据权利要求16所述的设备,所述主题维度向量提取装置包括:
    提取模块,用于通过所述特定主题生成模型,从所述内容源所包括的用户原创内容中,提取至少一个主题描述词;
    转换模块,用于通过第一词向量转换模型将所提取的所述主题描述词转换为所述至少一个主题维度向量。
  21. 根据权利要求16所述的设备,所述主题句挖掘装置包括:
    分句处理模块,用于对所述内容源中的用户原创内容进行分句和/或过滤操作,得到与所述主题维度向量相对应的一个或多个候选主题句;
    计算模块,用于计算每一个所述候选主题句的句子向量与所述主题维度向量之间的余弦相似度;
    判断模块,用于将所计算出的所述余弦相似度符合特定阈值的所述候选主题句,确定作为对应所述主题维度向量的主题句。
  22. 根据权利要求21所述的设备,所述计算模块还用于:
    针对所述候选主题句进行分词处理,得到对所述候选主题句的分词结果;
    通过第二词向量转换模型将所述分词结果转换成句子向量;
    计算所转换成的所述句子向量和所述主题维度向量之间的余弦相似度。
  23. 根据权利要求16所述的设备,所述设备还包括主题句筛选装置;
    所述主题句筛选装置,用于在通过所述主题句拼接生成装置对对应于所述主题维度向量的主题句进行拼接合成之前,根据主题核心词和/或特定情感分析算法,对对应于每一个所述主题维度向量的所述主题句进行筛选。
  24. 根据权利要求23所述的设备,所述主题句筛选装置被配置为:
    针对对应于所述主题维度向量的任意一个主题句,对所述主题句进行依存句法分析,得到所述主题句的主题核心词;
    判断所述主题句的所述主题核心词是否属于所述主题句所对应的所述主题维度向量;
    如果是,则保留所述主题句;否则,过滤所述主题句。
  25. 根据权利要求23所述的设备,所述主题句筛选装置被配置为:
    针对对应于所述主题维度向量的任意一个主题句,利用特定情感分析算法确定所述主题句的情感倾向分;
    根据所述主题句的情感倾向分判断所述主题句的情感是否呈正向;
    如果是,则保留所述主题句;否则,过滤所述主题句。
  26. 根据权利要求16所述的设备,所述主题句拼接生成装置还用于,
    在对对应于每一个所述主题维度向量的主题句进行拼接合成之前,确定所述主题句与对应的所述主题维度向量之间的思想关联度;
    根据所述主题句与对应的所述主题维度向量之间的思想关联度,从对应于每一个所述主题维度向量的主题句中挑选出核心主题句;
    对所述核心主题句进行拼接合成。
  27. 根据权利要求26所述的设备,所述主题句拼接生成装置还用于,
    通过循环神经网络和关键词提取算法,确定所述主题句与对应的所述主题维度向量之间的思想关联度。
  28. 根据权利要求16所述的设备,所述主题句拼接生成装置还用于,
    在对每一个所述主题维度向量的所述主题句进行拼接合成之前,采用匹配机制方法来对所得到的对应于每一个所述主题维度向量的所述主题句进行去重处理。
  29. 根据权利要求16所述的设备,所述设备还包括图文融合装置;
    所述图文融合装置,用于从特定图片库中挑选与所述至少一个主题维度向量相关联的图片信息;
    将所述图片信息与所生成的所述文章进行融合,以形成图文混合的文章。
  30. 根据权利要求16至29中任一项所述的设备,所述设备还包括文章标题生成装 置;
    所述文章标题生成装置,用于查找与所生成的所述文章的内容或所述需求信息相匹配的特定对象定语和/或句子语料;
    利用查找到的所述特定对象定语和/或句子语料,通过特定模板规则生成文章标题。
  31. 一种文章生成设备,包括:
    一个或者多个处理器;
    存储器;
    存储在所述存储器中的程序,当被所述一个或者多个处理器执行时,所述程序使所述处理器执行如权利要求1-15中任意一项所述的方法。
  32. 一种计算机可读存储介质,所述计算机可读存储介质存储有程序,当所述程序被处理器执行时,使得所述处理器执行如权利要求1-15中任意一项所述的方法。
PCT/CN2018/121310 2018-05-15 2018-12-14 文章生成 WO2019218660A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/097,405 US11288454B2 (en) 2018-05-15 2020-11-13 Article generation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810462391.7 2018-05-15
CN201810462391.7A CN108694160B (zh) 2018-05-15 2018-05-15 文章生成方法、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/097,405 Continuation US11288454B2 (en) 2018-05-15 2020-11-13 Article generation

Publications (1)

Publication Number Publication Date
WO2019218660A1 true WO2019218660A1 (zh) 2019-11-21

Family

ID=63846300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/121310 WO2019218660A1 (zh) 2018-05-15 2018-12-14 文章生成

Country Status (3)

Country Link
US (1) US11288454B2 (zh)
CN (1) CN108694160B (zh)
WO (1) WO2019218660A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342980A (zh) * 2021-06-29 2021-09-03 中国平安人寿保险股份有限公司 Ppt文本挖掘的方法、装置、计算机设备及存储介质
CN114706974A (zh) * 2021-09-18 2022-07-05 北京墨丘科技有限公司 一种技术问题信息挖掘方法、装置与存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694160B (zh) * 2018-05-15 2021-01-22 北京三快在线科技有限公司 文章生成方法、设备及存储介质
CN109657043B (zh) * 2018-12-14 2022-01-04 北京百度网讯科技有限公司 自动生成文章的方法、装置、设备及存储介质
CN109885821B (zh) * 2019-03-05 2023-07-18 中国联合网络通信集团有限公司 基于人工智能的文章撰写方法及装置、计算机存储介质
US11210470B2 (en) * 2019-03-28 2021-12-28 Adobe Inc. Automatic text segmentation based on relevant context
CN110377891B (zh) * 2019-06-19 2023-01-06 北京百度网讯科技有限公司 事件分析文章的生成方法、装置、设备及计算机可读存储介质
CN111182332B (zh) * 2019-12-31 2022-03-22 广州方硅信息技术有限公司 视频处理方法、装置、服务器及存储介质
CN111814482B (zh) * 2020-09-03 2020-12-11 平安国际智慧城市科技股份有限公司 文本关键数据的提取方法、系统和计算机设备
CN112989187B (zh) * 2021-02-25 2022-02-01 平安科技(深圳)有限公司 创作素材的推荐方法、装置、计算机设备及存储介质
CN115204118B (zh) * 2022-07-12 2023-06-27 平安科技(深圳)有限公司 文章生成方法、装置、计算机设备及存储介质
CN115965013B (zh) * 2023-03-16 2023-11-28 北京朗知网络传媒科技股份有限公司 基于需求识别的汽车传媒文章生成方法和装置
US11868313B1 (en) 2023-03-28 2024-01-09 Lede AI Apparatus and method for generating an article

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503255A (zh) * 2016-11-15 2017-03-15 科大讯飞股份有限公司 基于描述文本自动生成文章的方法及系统
CN106663087A (zh) * 2014-10-01 2017-05-10 株式会社日立制作所 文章生成系统
CN106844322A (zh) * 2017-01-22 2017-06-13 百度在线网络技术(北京)有限公司 智能文章生成方法和装置
CN106970898A (zh) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 用于生成文章的方法和装置
CN108694160A (zh) * 2018-05-15 2018-10-23 北京三快在线科技有限公司 文章生成方法、设备及存储介质

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6356864B1 (en) * 1997-07-25 2002-03-12 University Technology Corporation Methods for analysis and evaluation of the semantic content of a writing based on vector length
NO316480B1 (no) * 2001-11-15 2004-01-26 Forinnova As Fremgangsmåte og system for tekstuell granskning og oppdagelse
US7720675B2 (en) * 2003-10-27 2010-05-18 Educational Testing Service Method and system for determining text coherence
US8200589B2 (en) * 2006-07-28 2012-06-12 Persistent Systems Limited System and method for network association inference, validation and pruning based on integrated constraints from diverse data
US8296168B2 (en) * 2006-09-13 2012-10-23 University Of Maryland System and method for analysis of an opinion expressed in documents with regard to a particular topic
US20100128042A1 (en) * 2008-07-10 2010-05-27 Anthony Confrey System and method for creating and displaying an animated flow of text and other media from an input of conventional text
US9047283B1 (en) * 2010-01-29 2015-06-02 Guangsheng Zhang Automated topic discovery in documents and content categorization
US10235681B2 (en) * 2013-10-15 2019-03-19 Adobe Inc. Text extraction module for contextual analysis engine
CA2973706A1 (en) * 2014-01-15 2015-07-23 Intema Solutions Inc. Item classification method and selection system for electronic solicitation
CN105095229A (zh) * 2014-04-29 2015-11-25 国际商业机器公司 训练主题模型的方法,对比文档内容的方法和相应的装置
US10073837B2 (en) * 2014-07-31 2018-09-11 Oracle International Corporation Method and system for implementing alerts in semantic analysis technology
US10042923B2 (en) * 2015-04-24 2018-08-07 Microsoft Technology Licensing, Llc Topic extraction using clause segmentation and high-frequency words
US9928300B2 (en) * 2015-07-16 2018-03-27 NewsRx, LLC Artificial intelligence article analysis interface
CN106933789B (zh) * 2015-12-30 2023-06-20 阿里巴巴集团控股有限公司 旅游攻略生成方法和生成系统
US9886501B2 (en) * 2016-06-20 2018-02-06 International Business Machines Corporation Contextual content graph for automatic, unsupervised summarization of content
US20180160200A1 (en) * 2016-12-03 2018-06-07 Streamingo Solutions Private Limited Methods and systems for identifying, incorporating, streamlining viewer intent when consuming media
US10380259B2 (en) * 2017-05-22 2019-08-13 International Business Machines Corporation Deep embedding for natural language content based on semantic dependencies
US10776566B2 (en) * 2017-05-24 2020-09-15 Nathan J. DeVries System and method of document generation
US20190266288A1 (en) * 2018-02-28 2019-08-29 Laserlike, Inc. Query topic map

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663087A (zh) * 2014-10-01 2017-05-10 株式会社日立制作所 文章生成系统
CN106503255A (zh) * 2016-11-15 2017-03-15 科大讯飞股份有限公司 基于描述文本自动生成文章的方法及系统
CN106844322A (zh) * 2017-01-22 2017-06-13 百度在线网络技术(北京)有限公司 智能文章生成方法和装置
CN106970898A (zh) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 用于生成文章的方法和装置
CN108694160A (zh) * 2018-05-15 2018-10-23 北京三快在线科技有限公司 文章生成方法、设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342980A (zh) * 2021-06-29 2021-09-03 中国平安人寿保险股份有限公司 Ppt文本挖掘的方法、装置、计算机设备及存储介质
CN113342980B (zh) * 2021-06-29 2024-05-17 中国平安人寿保险股份有限公司 Ppt文本挖掘的方法、装置、计算机设备及存储介质
CN114706974A (zh) * 2021-09-18 2022-07-05 北京墨丘科技有限公司 一种技术问题信息挖掘方法、装置与存储介质

Also Published As

Publication number Publication date
CN108694160B (zh) 2021-01-22
US20210064823A1 (en) 2021-03-04
CN108694160A (zh) 2018-10-23
US11288454B2 (en) 2022-03-29

Similar Documents

Publication Publication Date Title
WO2019218660A1 (zh) 文章生成
CN109844708B (zh) 通过聊天机器人推荐媒体内容
US20240004934A1 (en) Multi-modal virtual experiences of distributed content
US10268766B2 (en) Systems and methods for computation of a semantic representation
US9384233B2 (en) Product synthesis from multiple sources
US20140164507A1 (en) Media content portions recommended
US10951555B2 (en) Providing local service information in automated chatting
US20140163980A1 (en) Multimedia message having portions of media content with audio overlay
WO2023065211A1 (zh) 一种信息获取方法以及装置
US20140163957A1 (en) Multimedia message having portions of media content based on interpretive meaning
JP6361351B2 (ja) 発話ワードをランク付けする方法、プログラム及び計算処理システム
US20220277053A1 (en) Generating app or web pages via extracting interest from images
WO2019056628A1 (zh) 关注点文案的生成
CN110737774B (zh) 图书知识图谱的构建、图书推荐方法、装置、设备及介质
US20230229694A1 (en) Systems and methods for generating supplemental content for media content
JPWO2016135905A1 (ja) 情報処理システム及び情報処理方法
AU2018250372A1 (en) Method to construct content based on a content repository
CN109472032A (zh) 一种实体关系图的确定方法、装置、服务器及存储介质
CN112784156A (zh) 基于意图识别的搜索反馈方法、系统、设备及存储介质
JP2016081265A (ja) 映像選択装置、映像選択方法、映像選択プログラム、特徴量生成装置、特徴量生成方法及び特徴量生成プログラム
Meng et al. Towards age-friendly E-commerce through crowd-improved speech recognition, multimodal search, and personalized speech feedback
CN112836057A (zh) 知识图谱的生成方法、装置、终端以及存储介质
KR102279125B1 (ko) 취향필터에 기반한 추천 정보 제공 단말 및 장치
CN110929122B (zh) 一种数据处理方法、装置和用于数据处理的装置
US12033186B2 (en) Method and system for enabling an interaction between a user and a podcast

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919324

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18919324

Country of ref document: EP

Kind code of ref document: A1