CN110956021A - Original article generation method, device, system and server - Google Patents
Original article generation method, device, system and server Download PDFInfo
- Publication number
- CN110956021A CN110956021A CN201911112545.0A CN201911112545A CN110956021A CN 110956021 A CN110956021 A CN 110956021A CN 201911112545 A CN201911112545 A CN 201911112545A CN 110956021 A CN110956021 A CN 110956021A
- Authority
- CN
- China
- Prior art keywords
- event
- hot
- theme
- news
- generating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000004458 analytical method Methods 0.000 claims abstract description 27
- 238000012216 screening Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims description 10
- 230000000699 topical effect Effects 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 description 17
- 230000000875 corresponding effect Effects 0.000 description 13
- 239000013598 vector Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000007246 mechanism Effects 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000012935 Averaging Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 241000239290 Araneae Species 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 241000272194 Ciconiiformes Species 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002889 sympathetic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of Internet, and discloses a method, a device, a system and a server for generating original articles, wherein the method for generating the original articles comprises the following steps: acquiring a hot event; screening the obtained hot events through a pre-trained theme analysis model, and determining the hot events related to a preset theme; acquiring a news set related to the hot event according to the hot event related to the preset theme; generating an event abstract of the hot event related to the preset theme according to the news set; and acquiring a subject short text based on a pre-generated subject knowledge base, and generating the original article by combining the event abstract. By generating the event abstract based on the news set, the invention can accurately summarize the event abstract of the hot event, and obtains the theme short texts based on the pre-generated theme knowledge base, and generates the original articles by combining the event abstract, thereby improving the context coherence of the articles.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a method, a device, a system and a server for generating original articles.
Background
With the development of internet technology, text generation technology also appears, and compared with the breakthrough progress made by image generation technology, the text generation technology still faces many challenges. In recent years, many article generators are emerging on the market, and are mainly classified into two types, namely template-based and deep learning-based models according to the technologies used by the article generators.
However, the text generated by the text generation technical scheme based on the template has a single structure, and the text generation technology based on the deep learning model is difficult to ensure the context logic continuity.
Based on this, there is a need for improvement in the art.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method, an apparatus, a system and a server for generating original articles, which can improve context consistency of the articles.
In a first aspect, an embodiment of the present invention provides a method for generating an original article, including:
acquiring a hot event;
screening the obtained hot events through a pre-trained theme analysis model, and determining the hot events related to a preset theme;
acquiring a news set related to the hot event related to the preset theme according to the hot event related to the preset theme;
generating an event abstract of the hot event according to the news set;
and acquiring a subject short text based on a pre-generated subject knowledge base, and generating the original article by combining the event abstract.
In some embodiments, the determining a topical event related to a preset theme comprises:
and generating a theme category label, and identifying the hot event through the theme category label.
In some embodiments, the generating an event summary of the trending events related to the preset topic according to the news collection includes:
extracting the original abstract related to the news event from the news set based on a pre-trained multi-document abstract extraction model to generate a candidate abstract set;
and generating an event summary of the hot event according to the candidate summary set.
In some embodiments, the generating an event summary of the trending events related to the preset topic according to the candidate summary set includes:
determining a summary to be rewritten from the candidate summary set;
and rewriting the abstract to be rewritten based on a pre-trained synonym sentence rewriting model to generate the event abstract of the hot event related to the preset theme.
In some embodiments, the obtaining a subject short text based on a pre-generated subject knowledge base, and generating the original article in combination with the event summary includes:
according to the theme category label, acquiring theme texts related to the theme category from the theme knowledge base;
and automatically splicing the event abstract and the subject short texts to generate the original article.
In some embodiments, after the obtaining the subject essay related to the subject category from the subject knowledge base according to the subject category tag, the method further includes:
screening out the unique optimal short texts from the obtained at least two subject short texts;
and rewriting the optimal short texts based on a pre-trained synonym sentence rewriting model to generate the event short texts of the hot events.
In some embodiments, the automatically splicing the event summary and the subject essay to generate the original article includes:
and automatically splicing the event abstract and the event short text to generate the original article.
In some embodiments, the preset theme comprises an insurance theme, and the theme repository comprises an insurance repository.
In a second aspect, an embodiment of the present invention provides an apparatus for generating an original article, including:
a hot event acquisition unit for acquiring a hot event;
the hot event determining unit is used for screening the obtained hot events through a pre-trained theme analysis model and determining the hot events related to a preset theme;
a news set acquiring unit, configured to acquire, according to the trending event, a news set related to the trending event;
the event abstract generating unit is used for generating an event abstract of the hot event related to the preset theme according to the news set;
and the original article generating unit is used for acquiring the subject short articles based on a pre-generated subject knowledge base and generating the original articles by combining the event abstract.
In some embodiments, the hit event determining unit is specifically configured to:
and generating a theme category label, and identifying the hot event through the theme category label.
In some embodiments, the event summary generation unit includes:
the candidate abstract collection module is used for extracting the original abstract related to the news event from the news collection based on a pre-trained multi-document abstract extraction model to generate a candidate abstract collection;
and the event abstract generating module is used for generating the event abstract of the hot event according to the candidate abstract set.
In some embodiments, the event summary generation module is specifically configured to:
determining a summary to be rewritten from the candidate summary set;
and rewriting the abstract to be rewritten based on a pre-trained synonym sentence rewriting model to generate the event abstract of the hot event.
In some embodiments, the original article generating unit includes:
the theme short text acquisition module is used for acquiring the theme short text related to the theme type from the theme knowledge base according to the theme type label;
and the original article generating module is used for automatically splicing the event abstract and the subject short text to generate the original article.
In some embodiments, the original article generating unit further includes:
the optimal short text generation module is used for screening out the only optimal short text from the obtained at least two subject short texts;
and the event short text generation module is used for rewriting the optimal short text based on a pre-trained synonym sentence rewriting model to generate the event short text of the hot event.
In some embodiments, the original article generation module is specifically configured to:
and automatically splicing the event abstract and the event short text to generate the original article.
In some embodiments, the preset theme comprises an insurance theme, and the theme repository comprises an insurance repository.
In a third aspect, an embodiment of the present invention provides a server, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating an original article described above.
In a fourth aspect, an embodiment of the present invention provides a system for generating an original article, including:
the above-mentioned server;
the third-party news platform is in communication connection with the server and comprises a third-party news library used for storing hot news information so that the server can acquire the hot news information;
and the article publishing platform is in communication connection with the server and is used for publishing the original article generated by the server.
In a fourth aspect, an embodiment of the present invention provides a non-volatile computer-readable storage medium, which stores computer-executable instructions for causing a server to execute the method for generating an original article described above.
In the method for generating the original article provided by each embodiment of the invention, firstly, a hot event is obtained; screening the obtained hot events through a pre-trained theme analysis model, and determining the hot events related to a preset theme; acquiring a news set related to the hot event related to the preset theme according to the hot event related to the preset theme; generating an event abstract of the hot event related to the preset theme according to the news set; and acquiring a subject short text based on a pre-generated subject knowledge base, and generating the original article by combining the event abstract. On one hand, hot news information is screened through a pre-trained topic analysis model to determine a hot event related to a preset topic, and on the other hand, a news set related to the hot event is acquired and an event abstract is generated based on the news set, so that the event abstract of the hot event can be accurately summarized, and an original article is generated based on a pre-generated topic knowledge base and the event abstract, so that the context coherence of the article can be improved.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
Fig. 1 is a schematic structural diagram of a system for generating an original article according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for generating an original article according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a topic analysis model based on a BERT pre-training model according to an embodiment of the present invention;
FIG. 4 is a detailed flowchart of step S40 in FIG. 2;
FIG. 5 is a detailed flowchart of step S42 in FIG. 4;
FIG. 6 is a detailed flowchart of step S50 in FIG. 2;
FIG. 7 is a schematic diagram of a Pointer-Generator Networks network structure according to an embodiment of the present invention;
fig. 8 is another flowchart of a method for generating an original article according to an embodiment of the present invention;
FIG. 9 is an interaction diagram of a system for generating original articles according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an original article generation apparatus provided in an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, if not conflicted, the various features of the embodiments of the invention may be combined with each other within the scope of protection of the invention. Additionally, while functional block divisions are performed in apparatus schematics, with logical sequences shown in flowcharts, in some cases, steps shown or described may be performed in sequences other than block divisions in apparatus or flowcharts. The terms "first", "second", "third", and the like used in the present invention do not limit data and execution order, but distinguish the same items or similar items having substantially the same function and action.
Text generation techniques still face many challenges compared to the breakthrough advances made by image generation techniques. In recent years, many article generators are emerging on the market, and are mainly classified into two types, namely template-based and deep learning-based models according to the technologies used by the article generators.
The technical scheme of text generation based on the template is mainly suitable for the fields with single article structure and rich structured data, such as weather forecast, financial news, sports news and the like, and the generated articles are high in readability. The specific principle is that a series of article templates are arranged in advance by machine mining or manual mode for articles in a certain field, and the article generation process is to fill structured data into the templates.
A text generation technology based on a deep learning model is designed with a fine neural network structure, network parameters are usually as high as millions or even billions, large-scale linguistic data are needed to train the model, and the method is widely applied to scenes such as picture title generation, voice-to-text conversion, chat robots, synonym sentence rewriting, article summarization and translation. The deep learning-based method requires large-scale training samples and powerful computing resources, and the model effect is not controllable, and particularly in text scenes such as paragraphs and chapters, continuity of context logic is difficult to guarantee.
The template-based method has a single generated article structure and is not suitable for insurance science popularization articles, and deep learning-based method adopts a complete end-to-end method and is difficult to ensure that sentences are linked up and connected.
Aiming at the problems, the invention provides a brand-new article generation scheme.
Before the present invention is explained in detail, terms and expressions referred to in the embodiments of the present invention are explained, and the terms and expressions referred to in the embodiments of the present invention are applied to the following explanations.
1) Abstract refers to a short description that compresses an article to contain the main information of the original text.
2) An unknown word (Out of Vocabulary, OOV) refers to a word that has not been present in the Vocabulary.
3) The Attention mechanism, also called an Attention mechanism, means that in a codec framework, an Attention model is added in a coding section to perform data weighted transformation on a source data sequence, or an Attention model is introduced at a decoding end to perform weighted change on target data, so that the system performance of the sequence in a natural mode on the sequence can be effectively improved, and the limitation that a traditional coder-decoder structure depends on an internal fixed-length vector during coding and decoding is broken through. These inputs are selectively learned by retaining intermediate output results of the LSTM encoder on the input sequence, and then training a model and associating the output sequence with the model as it is output.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a system for generating an original article according to an embodiment of the present invention. As shown in fig. 1, the system 100 for generating an original article includes: a server 11, a third party news platform 12, and an article publication platform 13.
The server 11 is in communication connection with a third-party news platform 12 and an article publishing platform 13 to complete the testing of the car insurance service, such as: the third-party news platform 12 is connected with the server 11 in a butt joint mode, the server 11 sends a news request to the third-party news platform 12 for obtaining third-party news, and the third-party news platform 12 processes the news request and sends news corresponding to the news request to the server 11.
In some embodiments, the number of the third party news platforms 12 may be one or multiple, the third party news platforms 12 may employ devices such as a computer terminal, a server, and a mobile terminal, and the third party news platform includes a third party news library for storing trending news information, so that the server obtains the trending news information. Preferably, the third-party news platform is a news website, and the news website includes a news library for storing popular news information.
The number of the servers 12 is plural, and the plural servers 12 may constitute a server cluster, for example: the server cluster includes: the first server, the second server, …, the nth server, or the server cluster may be a cloud computing service center, the cloud computing service center includes a plurality of servers, and the servers are used as insurance company servers for interfacing with business personnel or developers of insurance companies.
In some embodiments, the server 11 may be preconfigured with a multi-layer software architecture, for example, the software architecture of the server 11 includes: the system comprises a service layer, a test layer and an interface layer, wherein the service layer is used for determining a service cooperation mode, the test layer is used for confirming test points and generating corresponding data, and the interface layer is used for butting a server of a third-party news platform or an article publishing platform.
Referring to fig. 2, fig. 2 is a flowchart of a method for generating an original article according to an embodiment of the present invention;
as shown in fig. 2, the method for generating the original article includes:
step S10: acquiring a hot event;
specifically, the search engine periodically captures popular search lists such as Baidu, microblog and head lists by means of a web crawler, and automatically acquires popular events from webpages of websites such as news websites and entertainment websites, for example: hot news information, and according to the hot news information, a hot event list is generated, or according to a hot event list included in a webpage of a news website, an entertainment website, or the like, the hot event list is obtained, and based on the hot event list, hot news information in the hot event list is obtained, for example: hot news information such as headline news, entertainment events, sports news, national events, and international news.
In the embodiment of the present invention, the acquiring of popular news information includes the following steps:
(1) capturing web pages of websites such as news websites and entertainment websites;
specifically, a search engine regularly captures popular search lists such as Baidu, microblog and headline by means of a web crawler, automatically acquires popular events from webpages of news websites or entertainment websites such as Baidu, microblog and headline, integrates the popular events, generates a popular event list, firstly selects a part of seed URLs, and puts the URLs into a URL queue to be captured; and taking out the URLs to be captured, analyzing the DNS to obtain the IP of the host, downloading the webpages corresponding to the URLs, storing the webpages into a downloaded webpage library, and putting the URLs into a captured URL queue. And analyzing the URL in the captured URL queue, analyzing other URLs in the captured URL queue, and putting the URL into the URL queue to be captured, thereby entering the next cycle.
(2) Storing data;
specifically, the search engine stores the data in the original page database through the web pages crawled by the crawler. The page data is identical to the HTML from the user's browser. When a search engine spider grabs a page, the spider also performs certain repeated content detection, and once a large amount of copied, collected or copied content exists on a website with a low access weight, the spider is likely not to crawl any more.
(3) Pre-treating;
specifically, the search engine captures a page back from the crawler, and performs preprocessing of various steps, for example: extracting characters, Chinese word segmentation, eliminating noise (such as copyright statement characters, navigation bars, advertisements and the like), index processing, link relation calculation, special file processing and the like.
(4) Providing retrieval service and website ranking;
specifically, after organizing and processing the information, the search engine provides a keyword retrieval service for the user, and displays the information related to the user retrieval to the user. Meanwhile, the website ranking is carried out according to the PageRank value (the visit ranking of the link) of the page.
In the embodiment of the invention, the hot news information is automatically acquired in a web crawler mode, the acquisition speed of the hot news information can be improved, a large amount of hot news information can be acquired, and data support can be provided for subsequently acquiring news sets related to the hot events.
Step S20: screening the obtained hot events through a pre-trained theme analysis model, and determining the hot events related to a preset theme;
specifically, the preset theme includes an insurance theme, the theme analysis model includes an insurance theme analysis model, the acquired trending news information is screened through an insurance theme analysis model generated by offline pre-training, and trending events related to the preset theme are determined, for example: and screening the acquired trending news information to determine trending events related to the insurance theme, such as: the obtained hot events are: some famous poems go to death, and some workers do not enjoy hospitalization subsidies because of not buying social security, etc. The pre-trained topic analysis model filters the acquired trending news information by judging the relevance of the trending event and the insurance, so as to determine the trending event relevant to the insurance, for example: some famous poems go to death, and some workers do not enjoy hospitalization subsidies because of not buying social security, etc.
In an embodiment of the present invention, the determining a topical event related to a preset theme includes:
and generating a theme category label, and identifying the hot event through the theme category label.
Specifically, after the obtained trending news information is screened and a trending event related to a preset theme is determined, for the trending event, a theme category corresponding to the trending event is determined, and a theme category tag is generated, and the trending event is identified by the theme category tag, for example: and if the insurance related to the hot event is ensured, determining that the hot event is an insurance related event, and identifying an insurance type label corresponding to the theme type for the insurance related event according to the theme type corresponding to the insurance related event, for example: the insurance-related event is tagged with an insurance seed label, such as: to "get life insurance label to someone of famous poems".
The pre-trained topic analysis model is described below by taking a preset topic as an insurance topic and a topic analysis model as an insurance topic analysis model as an example:
the topic analysis model is a topic analysis model based on a BERT pre-training model, the BERT model belongs to a pre-training model of word vectors, and the BERT model is a speech model constructed based on a bidirectional Transformer. The BERT model provides two sizes of models, Base and Large, on the english dataset. Uncsated means that all the input words are converted into lower case, and cast means that the input words store their upper case (required in terms of named entity recognition, etc.).
Referring to fig. 3, fig. 3 is a schematic diagram of a topic analysis model based on a BERT pre-training model according to an embodiment of the present invention;
as shown in fig. 3, the BERT model is trained with as little change as possible in the training process by inputting sentences (Single sequence), this training phase is called Fine-Tuning (Fine-Tuning), the first input token is filled by a special [ CLS ] token, CLS represents Classification, as with the simple pure encoder of the Transformer, BERT is input in word sequences, words flow continuously upwards in its stack, each layer applies a self-attention mechanism and passes its results through a feed-forward network and then to the next encoder, where BERT is a self-supervision method for pre-training depth transformers, which can be Fine-tuned for different downstream tasks after pre-training, BERT optimizes a masked language model (mask-Tuning, MLM) and single sentence prediction (NSP), which only requires the use of large datasets without tags for training.
For each sequence of words or sub-units, X ═ (X1.., xn), BERT produces its context-based vector representation by the encoder: x1, an xn ═ enc (x1, an xn). Since BERT uses the encoder by using a depth transformor structure, the model uses its position-embedding p 1.
The Masked Language Model (MLM), also called gap filling test, predicts the missing word at a certain position in a sequence. This step samples one subset Y from the set of words X and replaces it with another set of words. In BERT, Y represents 15% of X. In Y, 80% of the words are replaced with [ MASK ], 10% are replaced with random words according to a unigram distribution, and 10% remain unchanged. The task is to predict the original word in Y using these replaced words.
Specifically, the identification modeling of the insurance hot events is a short text classification problem, wherein 10 types of common insurance risk categories such as health risks, accident risks, serious risks, life risks, property risks, unemployed insurance, endowment insurance, car insurance, children insurance, travel insurance and the like are defined in the insurance industry, and a huge hot event library is labeled to obtain training samples. Considering that training samples are few, in order to improve the generalization capability of the model, the Google-based BERT Chinese pre-training model is subjected to fine tuning aiming at the short text classifier, and finally the accuracy of the short text classifier in a verification set is up to 95%, and the recall rate of an insurance event is up to 93%.
In the embodiment of the invention, the topic analysis model is trained in advance, the topical news information is screened through the topic analysis model, irrelevant topical events can be filtered, only the topical events relevant to the preset topic are reserved, the reduction of the topical events irrelevant to the preset topic is facilitated, and the processing speed of the system is improved.
Step S30: acquiring a news set related to the hot event related to the preset theme according to the hot event related to the preset theme;
specifically, according to the trending event, a news set related to the trending event is acquired from an original page database where trending news information acquired by a web crawler is stored, or a news article set related to the trending event is captured from a third-party news database by using the web crawler, for example: a plurality of related news collections are obtained from news websites by using query which is the fact that a famous poer is lost.
In this embodiment of the present invention, the acquiring the news set related to the trending event related to the preset topic includes:
acquiring a news set strongly related to the trending event, specifically, acquiring news information strongly related to the trending event by using a web crawler according to a title of the trending event, and further, aggregating a plurality of news information to generate a news set strongly related to the trending event, where the acquiring of the news information strongly related to the trending event includes: calculating a correlation coefficient between the trending event and the news information, where the correlation coefficient is used to reflect a linear correlation degree between the trending event and the news information, and the strong correlation means that the correlation coefficient between the trending event and the news information is greater than a preset threshold, and if the correlation coefficient between the trending event and the news information is greater than the preset threshold, determining that the news information is strongly correlated with the trending event, for example: the preset threshold value is 0.9, if the correlation coefficient of the hot event and the news information is larger than 0.9, the news information is determined to be strongly correlated with the hot event, at the moment, the news information is obtained, a plurality of news information are collected, and a news set is generated.
In the embodiment of the invention, by acquiring the news set related to the hot event, the news information in the news set is conveniently screened, and the article can be better generated.
Step S40: generating an event abstract of the hot event related to the preset theme according to the news set;
specifically, the news set includes a plurality of news articles and/or news information, and summary information in the news articles and/or news information is acquired through the news articles and/or news information, so as to further generate the event summary of the trending event.
Specifically, referring back to fig. 4, fig. 4 is a detailed flowchart of step S40 in fig. 2;
because the generated abstract generated by the abstract generation technology based on the depth model generally has the problem of context incoherence, in order to improve the context coherence, the invention acquires the article abstract by adopting an extraction method.
As shown in fig. 4, step S40: generating an event summary of the trending event according to the news collection, including:
step S41: extracting the original abstract related to the news event from the news set based on a pre-trained multi-document abstract extraction model to generate a candidate abstract set;
the method for extracting the news event abstract comprises the following steps of extracting an article abstract through a multi-document abstract extraction model, wherein the news set comprises a plurality of news articles and/or news information, specifically, extracting an original abstract related to the news event from the news set based on the pre-trained multi-document abstract extraction model to generate a candidate abstract set, and comprises the following steps:
(1) acquiring keywords of the news event from the news set based on a TF-IDF keyword extraction algorithm;
(2) adding weights to the keywords of the news events to generate a weighted keyword set;
(3) extracting key sentences of each news article or news information based on a TextRank algorithm;
(4) using the continuous text segments containing all key sentences as article abstracts of the news articles or news information to generate a candidate abstract set of the news events;
step S42: and generating an event summary of the hot event according to the candidate summary set.
Specifically, the candidate summary set includes a plurality of candidate summaries, so that a unique event summary needs to be determined according to the plurality of candidate summaries, specifically, please refer to fig. 5 again, where fig. 5 is a detailed flowchart of step S42 in fig. 4;
as shown in fig. 5, step S42: the generating the event summary of the hot event according to the candidate summary set comprises:
step S421: determining a summary to be rewritten from the candidate summary set;
specifically, the determining the digest to be rewritten from the candidate digest set includes: and counting the weight accumulation sum of the keywords contained in each news article or article abstract of news information, sequencing the weight accumulation sum from high to low, and taking the article abstract with the highest score as the abstract to be rewritten of the hot event.
Step S422: and rewriting the abstract to be rewritten based on a pre-trained synonym sentence rewriting model to generate the event abstract of the hot event.
Specifically, the method includes training a synonym rewriting model based on deep learning in advance, rewriting the to-be-rewritten summary, and generating an event summary of the trending event, for example: and pre-training a synonymy sentence rewriting model based on a pointer generation network, rewriting the abstract to be rewritten, and generating an event abstract of the hot event.
In the embodiment of the invention, the original abstract in the news set is extracted based on the pre-trained multi-document abstract extraction model to generate the candidate abstract set, the abstract to be rewritten is determined from the candidate abstract set, and the abstract to be rewritten is rewritten based on the pre-trained synonym sentence rewriting model, so that the event abstract of the hot event is generated, and the context coherence can be ensured.
Step S50: and acquiring a subject short text based on a pre-generated subject knowledge base, and generating the original article by combining the event abstract.
Specifically, the subject knowledge base includes an insurance knowledge base, and the insurance knowledge base includes a plurality of subject texts, for example: the science popularization essay is an insurance science essay, such as: science essays such as payment program, term, notice about insurance, wherein, the science essays in the subject knowledge base carry subject type labels, such as: insurance risk kind label. Specifically, referring to fig. 6 again, fig. 6 is a detailed flowchart of step S20 in fig. 2, as shown in fig. 6, the step S50: generating the original article based on a pre-generated topic knowledge base and combining the event abstract, wherein the generating of the original article comprises the following steps:
step S51: according to the theme category label, acquiring theme texts related to the theme category from the theme knowledge base;
specifically, a topic knowledge base is generated in advance, the topic knowledge base includes a plurality of topic texts, the topic texts carry topic type tags, and according to the topic type tags corresponding to the trending events, topic texts related to topic types corresponding to the topic type tags are obtained from the topic knowledge base, that is, topic texts carrying topic type tags in the topic knowledge base are obtained, for example: and if the theme knowledge base is an insurance knowledge base and the theme type label corresponding to the hot event is an insurance vehicle insurance label, acquiring the theme essay carrying the insurance vehicle insurance label in the insurance knowledge base according to the insurance vehicle insurance label.
Step S52: automatically splicing the event abstract and the subject short texts to generate the original article;
specifically, the subject essay carrying the subject type label is screened out from the subject knowledge base, and the event summary and the subject essay are spliced to generate the original article.
It is understood that the original article further includes a title, and the title of the original article is a title of the trending event, for example: headlines of trending news information obtained from third party news platforms.
In this embodiment of the present invention, after obtaining the theme texts related to the theme categories from the theme knowledge base according to the theme category tags, the method further includes:
screening out the unique optimal short texts from the obtained at least two subject short texts;
it can be understood that the topic knowledge base may include a plurality of topic texts related to the topic categories, that is, at least two topic texts carrying topic category tags corresponding to the topic categories are stored in the topic knowledge base, and at this time, a unique topic is determined to be used for being spliced with the event summary to generate the original article. Therefore, a unique optimal essay needs to be screened out from the obtained at least two subject essays, and specifically, the screening out the unique optimal essay from the obtained at least two subject essays includes: and acquiring all the subject texts related to the subject types, determining keywords, searching the contents of all the subject texts by the keywords, and taking the subject text with the highest sequence as the unique optimal text according to the sequence of the keywords from high to low in the contents of the subject texts. In the embodiment of the present invention, the keyword may include one or more keywords, and the keyword may further include a weight, for example: the keywords comprise a first keyword A and a second keyword B, wherein the weight of A is 60%, the weight of B is 40%, according to the occurrence frequency of the first keyword A and the second keyword B in the subject short texts, the weights of the first keyword A and the second keyword B are combined, weighted summation is carried out, and the subject short text with the largest sum value obtained by weighted summation is determined to be the only optimal short text.
And rewriting the optimal short texts based on a pre-trained synonym sentence rewriting model to generate the event short texts of the hot events.
Specifically, the synonym rewrite model is a deep learning-based synonym rewrite model, and the optimal short text is rewritten based on a pre-trained synonym rewrite model to generate the event short text of the hot event, including the following steps:
(1) constructing a synonym parallel corpus;
specifically, by means of a translation application program, a huge number of synonym sentence pairs are constructed for each sentence in the published news base through a secondary translation mode of converting Chinese into English and then converting English into Chinese, wherein the synonym sentence pairs are synonym sentence pairs with noise.
(2) Generating a synonymy sentence rewriting model based on a pointer generation network;
referring to fig. 7, fig. 7 is a schematic diagram of a Pointer-Generator Networks network structure according to an embodiment of the present invention;
as shown in fig. 7, the left side is an encoder end, the right side is a decoder end, each step of the encoder end is also subjected to embebing firstly, after embeding is completed, the enencoder end is input into lstm, each step of output in lstm is used, the output of each step is subjected to weighting, each encoder has a coefficient α (α can be a scalar or a vector, the vector is subjected to point multiplication with the vector output in lstm, the scalar is directly multiplied with the vector, α needs to be subjected to normalization processing), the output of each step of lstm at the encoder end is multiplied with α, all multiplication is finally added, namely a weighted averaging process, after the weighted averaging process, the vector is input into a second unit at the decoder end, the first step of output at the decoder end participates in a calculation process of α, the vector is operated with each lstm output in the encoder end, a value α is calculated, a value of a decotmvalue of α and a vector are calculated, a weighted average value of the vector is obtained by using information of the first step, the first step of output, the encoder end is subjected to a calculation process of calculation, the weighted averaging with the vector, the weighted averaging is obtained by 3625, the weighted averaging, the vector is obtained by the second step of calculation, and the encoder end, the weighted averaging, the weighted average value of the weighted output of the encoder end, the encoder end is obtained by the weighted average of the encoder end, the weighted output, the entropy is obtained by 3625.
The Pointer-generating network (Pointer-Generator Networks) is a mixture of baseline and point Networks, and allows words to be copied by pointing, and words can be generated from a fixed vocabulary table, so that compared with other synonym rewriting models, the synonym rewriting model based on the Pointer-generating network has the following advantages:
firstly, the extension mechanism is continued, and long texts can be coded;
secondly, in addition to the Attention mechanism, a conversion mechanism is added, namely Attention does not consider the situation of repeated sentences, and redundancy processing is added on the basis of Attention, so that existing words in the sentences are reduced in future Attention, and the problem of word repetition frequently occurring in other models is avoided;
thirdly, aiming at the condition of unregistered words (OOV), a pointer mechanism and a generator mechanism are combined, wherein the pointer mechanism copies words from the original text, the generator mechanism selects the most suitable words from a word bank, the pointer mechanism effectively solves the problem that special nouns such as names, organizations, code numbers, time dates, place names and the like are wrongly rewritten, and the generator mechanism ensures that high-frequency words are reasonably rewritten with higher probability. The pointer generation network-based synonym rewriting model has good model effect, the rewritten sentences are sampled and evaluated, the proportion of high-quality sentence rewriting is as high as 95%, and the problem of poor rewriting effect is concentrated on small rewriting granularity.
In the embodiment of the invention, the optimal short text is rewritten by adopting a synonym sentence rewriting model based on a pointer generation network to generate the event short text of the hot event, and the event abstract of the hot event is spliced with the event short text to generate an original article, so that better rewriting can be performed to generate a high-quality event short text.
Referring to fig. 8 again, fig. 8 is another flowchart of a method for generating an original article according to an embodiment of the present invention;
as shown in fig. 8, the method for generating the original article includes:
step S801: hot event monitoring;
specifically, a web crawler is used to periodically capture popular search lists such as Baidu, microblog and headline, to automatically obtain popular news information from web pages of websites such as news websites and entertainment websites, and to generate a popular event list according to the popular news information, or to obtain the popular event list according to the popular event list included in the web pages of websites such as news websites and entertainment websites, and to obtain the popular news information in the popular event list based on the popular event list, for example: hot news information such as headline news, entertainment events, sports news, national events, and international news.
Step S802: identifying an insurance hot door event;
specifically, the relevance of the hot event and the insurance is judged by utilizing an offline pre-trained insurance theme model, and an insurance risk type label is marked aiming at the insurance-related event.
Step S803: capturing the insurance hot news;
specifically, a web crawler is utilized to retrieve a set of related news articles for capturing insurance trending events from a third-party news library.
Step S804: extracting the insurance hot news abstract;
specifically, an event abstract capable of accurately summarizing news events is obtained from a news set of popular events by using an offline pre-trained multi-document abstract extraction model.
Step S805: the abstract is originally rewritten;
specifically, the synonym is rewritten for each sentence of the event summary through an offline pre-trained synonym rewriting model, the smoothness of each sentence is ensured, and the semantics of each sentence are the same as that of the original sentence, so that the original summary of the insurance event is obtained.
Step S806: splicing the articles;
specifically, an insurance science popularization text matched with the label is randomly screened out from an insurance knowledge base by using an insurance risk type label of the event, and the original abstract and the science popularization text are spliced to obtain an insurance science popularization original article based on the hot news event.
In an embodiment of the present invention, a method for generating an original article is provided, including: acquiring hot news information; screening the obtained trending news information through a pre-trained theme analysis model, and determining trending events related to preset themes; acquiring a news set related to the hot event according to the hot event; generating an event abstract of the hot event according to the news set; and generating the original article by combining the event abstract based on a pre-generated topic knowledge base. On one hand, the method can realize that the hot news information is followed up to a current affair hotspot and is easy to cause reader resonance, after the article is published, compared with a knowledge-based insurance science popularization article, the reading amount and the forwarding amount of a user have obvious advantages, on the other hand, by generating an event abstract of the hot affair, the method can ensure that the generated article has originality and can pass the originality detection of platforms such as hundreds of degrees, headlines and the like, and finally, the method can automatically generate the original article, compared with manual editing, the tool has low efficiency and high cost, and can seize the news hotspot at the first time.
Referring to fig. 9, fig. 9 is an interaction diagram of a system for generating an original article according to an embodiment of the present invention;
as shown in fig. 9, the system for generating the original article includes a server, a third-party news platform, and an article publishing platform, where the interaction process includes:
step S901: sending a news information request;
specifically, the server sends a news information request to the third-party news platform, so that the third-party news platform returns corresponding news information to the server based on the news information request.
Step S902: leading out hot news information;
specifically, the third-party news platform derives the popular news information according to the news information request, and sends the popular news information to the server.
Step S903: generating an original article;
specifically, the server acquires hot news information;
screening the obtained trending news information through a pre-trained theme analysis model, and determining trending events related to preset themes; acquiring a news set related to the hot event according to the hot event; generating an event abstract of the hot event according to the news set; and generating the original article by combining the event abstract based on a pre-generated topic knowledge base.
Step S904: sending an original article;
specifically, the server sends the original article to the article publishing platform, so that the article publishing platform publishes the original article to its own platform based on the original article.
Step S905: releasing original articles;
specifically, the article publishing platform publishes the original article according to the original article sent by the server. Wherein, the article publishing platform comprises: WeChat public number, headline number, hundred family number, penguin number, and the like, or other insurance information platforms.
The original article is published to the article publishing platform through the server, and the generated insurance article can be directly published to self-media platforms such as WeChat public numbers, headline numbers, hundred family numbers, penguin numbers and the like or other insurance information platforms. The article focus of formation is at present with the news hotspot that the insurance is strong relevant, and is real sensible, arouses the reader sympathetic response easily, compares the boring profound education, to promoting user's insurance consciousness, propagate insurance knowledge and have important value.
In an embodiment of the present invention, a method for generating an original article is provided, including: acquiring hot news information; screening the obtained trending news information through a pre-trained theme analysis model, and determining trending events related to preset themes; acquiring a news set related to the hot event according to the hot event; generating an event abstract of the hot event according to the news set; and generating the original article by combining the event abstract based on a pre-generated topic knowledge base. By generating the event abstract based on the news set, the invention can accurately summarize the event abstract of the hot event, and generate the original article based on the pre-generated theme knowledge base and combining the event abstract, thereby improving the context consistency of the article.
Referring to fig. 10 again, fig. 10 is a schematic structural diagram of an original article generating device according to an embodiment of the present invention;
as shown in fig. 10, the original article generation apparatus 80 includes:
a hot event acquiring unit 801 for acquiring a hot event;
a hit event determining unit 802, configured to screen the obtained hit events through a pre-trained topic analysis model, and determine a hit event related to a preset topic;
a news set obtaining unit 803, configured to obtain, according to the trending event related to the preset theme, a news set related to the trending event related to the preset theme;
an event summary generating unit 804, configured to generate an event summary of the trending event related to the preset topic according to the news set;
the original article generating unit 805 acquires a subject short document based on a pre-generated subject knowledge base, and generates the original article by combining the event summary.
In this embodiment of the present invention, the hit event determining unit 802 is specifically configured to:
and generating a theme category label, and identifying the hot event through the theme category label.
In some embodiments, the event summary generation unit 804 includes:
a candidate abstract collecting module 8041, configured to extract, based on a pre-trained multi-document abstract extracting model, an original abstract related to the news event from the news collection, and generate a candidate abstract collection;
an event summary generating module 8042, configured to generate an event summary of the trending event according to the candidate summary set.
In some embodiments, the event summary generation module 8042 is specifically configured to:
determining a summary to be rewritten from the candidate summary set;
and rewriting the abstract to be rewritten based on a pre-trained synonym sentence rewriting model to generate the event abstract of the hot event.
In some embodiments, the original article generating unit 805 includes:
a topic short article obtaining module 8051, configured to obtain, according to the topic category tag, a topic short article related to the topic category from the topic knowledge base;
an original article generating module 8054, configured to automatically splice the event summary and the subject essay, and generate the original article.
In some embodiments, the original article generating unit 805 further includes:
an optimal short text generation module 8052, configured to screen out a unique optimal short text from the obtained at least two subject short texts;
an event short text generating module 8053, configured to rewrite the optimal short text based on a pre-trained synonym rewriting model, and generate an event short text of the hot event.
In some embodiments, the original article generating module 8054 is specifically configured to:
and automatically splicing the event abstract and the event short text to generate the original article.
In some embodiments, the preset theme comprises an insurance theme, and the theme repository comprises an insurance repository.
In an embodiment of the present invention, an apparatus for generating an original article is applied to a server, and the apparatus for generating an original article includes: a hot event acquisition unit for acquiring a hot event; the hot event determining unit is used for screening the obtained hot events through a pre-trained theme analysis model and determining the hot events related to a preset theme; a news set obtaining unit, configured to obtain, according to the trending event related to the preset theme, a news set related to the trending event related to the preset theme; the event abstract generating unit is used for generating an event abstract of the hot event related to the preset theme according to the news set; and the original article generating unit is used for acquiring the subject short articles based on a pre-generated subject knowledge base and generating the original articles by combining the event abstract. On one hand, hot news information is screened through a pre-trained topic analysis model to determine a hot event related to a preset topic, and on the other hand, a news set related to the hot event is acquired and an event abstract is generated based on the news set, so that the event abstract of the hot event can be accurately summarized, and an original article is generated based on a pre-generated topic knowledge base and the event abstract, so that the context coherence of the article can be improved.
Referring to fig. 11 again, fig. 11 is a schematic structural diagram of a server according to an embodiment of the present invention;
as shown in fig. 11, the server 110 includes one or more processors 111 and memory 112. In fig. 11, one processor 111 is taken as an example.
The processor 111 and the memory 112 may be connected by a bus or other means, such as the bus connection in fig. 11.
The memory 112, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the vehicle insurance data testing method in the embodiment of the present invention. The processor 111 executes various functional applications and data processing by running the nonvolatile software programs, instructions, and modules stored in the memory 112, that is, the functions of the modules or units of the original article generation method and the apparatus embodiment provided by the method embodiment are realized.
The memory 112 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 112 may optionally include memory located remotely from the processor 111, which may be connected to the processor 111 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 112 and, when executed by the one or more processors 111, perform the method of generating an original article in any of the method embodiments described above.
The server 110 of embodiments of the present invention exists in a variety of forms, performing the various steps described above and shown in FIG. 2; when the functions of the above units can also be implemented, the server 110 includes but is not limited to:
(1) tower server
The general tower server chassis is almost as large as the commonly used PC chassis, while the large tower chassis is much larger, and the overall dimension is not a fixed standard.
(2) Rack-mounted server
Rack-mounted servers are a type of server that has a standard width of 19 inch racks, with a height of from 1U to several U, due to the dense deployment of the enterprise. Placing servers on racks not only facilitates routine maintenance and management, but also may avoid unexpected failures. First, placing the server does not take up too much space. The rack servers are arranged in the rack in order, and no space is wasted. Secondly, the connecting wires and the like can be neatly stored in the rack. The power line, the LAN line and the like can be distributed in the cabinet, so that the connection lines accumulated on the ground can be reduced, and the accidents such as the electric wire kicking off by feet can be prevented. The specified dimensions are the width (48.26cm ═ 19 inches) and height (multiples of 4.445 cm) of the server. Because of its 19 inch width, a rack that meets this specification is sometimes referred to as a "19 inch rack".
(3) Blade server
A blade server is a HAHD (High Availability High Density) low cost server platform designed specifically for the application specific industry and High Density computer environment, where each "blade" is actually a system motherboard, similar to an individual server. In this mode, each motherboard runs its own system, serving a designated group of different users, without any relationship to each other. Although system software may be used to group these motherboards into a server cluster. In the cluster mode, all motherboards can be connected to provide a high-speed network environment, and resources can be shared to serve the same user group.
(4) Cloud server
The cloud server (ECS) is a computing Service with simplicity, high efficiency, safety, reliability, and flexible processing capability. The management mode is simpler and more efficient than that of a physical server, and a user can quickly create or release any plurality of cloud servers without purchasing hardware in advance. The distributed storage of the cloud server is used for integrating a large number of servers into a super computer, and a large number of data storage and processing services are provided. The distributed file system and the distributed database allow access to common storage resources, and IO sharing of application data files is achieved. The virtual machine can break through the limitation of a single physical machine, dynamically adjust and allocate resources to eliminate single-point faults of the server and the storage equipment, and realize high availability.
Embodiments of the present invention also provide a non-transitory computer storage medium storing computer-executable instructions, which are executed by one or more processors, such as the processor 111 in fig. 11, to enable the one or more processors to execute the method for generating an original article in any of the method embodiments.
An embodiment of the present invention further provides a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a server, the server executes the method for generating the original article.
The above-described embodiments of the apparatus or device are merely illustrative, wherein the unit modules described as separate parts may or may not be physically separate, and the parts displayed as module units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network module units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
Claims (11)
1. A method for generating original articles is characterized by comprising the following steps:
acquiring a hot event;
screening the obtained hot events through a pre-trained theme analysis model, and determining the hot events related to a preset theme;
acquiring a news set related to the hot event related to the preset theme according to the hot event related to the preset theme;
generating an event abstract of the hot event related to the preset theme according to the news set;
and acquiring a subject short text based on a pre-generated subject knowledge base, and generating the original article by combining the event abstract.
2. The method of claim 1, wherein determining a topical event related to a preset theme comprises:
and generating a theme category label, and identifying the hot event through the theme category label.
3. The method of claim 1, wherein generating the event summary of the trending events related to the preset topic from the news corpus comprises:
extracting the original abstract related to the news event from the news set based on a pre-trained multi-document abstract extraction model to generate a candidate abstract set;
and generating the event summary of the hot events related to the preset theme according to the candidate summary set.
4. The method according to claim 3, wherein the generating an event summary of the trending events related to the preset topic according to the candidate summary set comprises:
determining a summary to be rewritten from the candidate summary set;
and rewriting the abstract to be rewritten based on a pre-trained synonym sentence rewriting model to generate the event abstract of the hot event related to the preset theme.
5. The method of claim 2, wherein the step of obtaining a subject short text based on a pre-generated subject knowledge base and generating the original article in combination with the event summary comprises:
according to the theme category label, acquiring theme texts related to the theme category from the theme knowledge base;
and automatically splicing the event abstract and the subject short texts to generate the original article.
6. The method of claim 5, wherein after retrieving the topic phrases associated with the topic categories from the topic knowledge base according to the topic category tags, the method further comprises:
screening out the unique optimal short texts from the obtained at least two subject short texts;
and rewriting the optimal short texts based on a pre-trained synonym sentence rewriting model to generate the event short texts of the hot events.
7. The method of claim 6, wherein automatically splicing the event summary and the subject essay to generate the original article comprises:
and automatically splicing the event abstract and the event short text to generate the original article.
8. The method of any one of claims 1-7, wherein the preset theme comprises an insurance theme and the theme repository comprises an insurance repository.
9. An apparatus for generating an original article, comprising:
the hot news information acquisition unit is used for acquiring hot events;
the hot event determining unit is used for screening the obtained hot events through a pre-trained theme analysis model and determining the hot events related to a preset theme;
a news set obtaining unit, configured to obtain, according to the trending event related to the preset theme, a news set related to the trending event related to the preset theme;
the event abstract generating unit is used for generating an event abstract of the hot event related to the preset theme according to the news set;
and the original article generating unit is used for acquiring the subject short articles based on a pre-generated subject knowledge base and generating the original articles by combining the event abstract.
10. A server, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating original articles of any of claims 1-8.
11. A system for generating an original article, comprising:
the server of claim 10;
the third-party news platform is in communication connection with the server and comprises a third-party news library used for storing hot news information so that the server can acquire the hot news information;
and the article publishing platform is in communication connection with the server and is used for publishing the original article generated by the server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911112545.0A CN110956021A (en) | 2019-11-14 | 2019-11-14 | Original article generation method, device, system and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911112545.0A CN110956021A (en) | 2019-11-14 | 2019-11-14 | Original article generation method, device, system and server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110956021A true CN110956021A (en) | 2020-04-03 |
Family
ID=69977340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911112545.0A Pending CN110956021A (en) | 2019-11-14 | 2019-11-14 | Original article generation method, device, system and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110956021A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328856A (en) * | 2020-10-30 | 2021-02-05 | 中国平安人寿保险股份有限公司 | Common event tracking method and device, computer equipment and computer readable medium |
CN112579800A (en) * | 2020-08-28 | 2021-03-30 | 太极计算机股份有限公司 | Automatic identification method for original news works and first-sending media of converged media |
CN112612892A (en) * | 2020-12-29 | 2021-04-06 | 达而观数据(成都)有限公司 | Special field corpus model construction method, computer equipment and storage medium |
CN113688230A (en) * | 2021-07-21 | 2021-11-23 | 武汉众智数字技术有限公司 | Text abstract generation method and system |
CN114021527A (en) * | 2021-11-04 | 2022-02-08 | 北京香侬慧语科技有限责任公司 | Long text generation method, system, medium, and device |
CN116306514A (en) * | 2023-05-22 | 2023-06-23 | 北京搜狐新媒体信息技术有限公司 | Text processing method and device, electronic equipment and storage medium |
CN117473072A (en) * | 2023-12-28 | 2024-01-30 | 杭州同花顺数据开发有限公司 | Financial research report generation method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145482A (en) * | 2017-03-28 | 2017-09-08 | 百度在线网络技术(北京)有限公司 | Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence |
CN107480127A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | The analysis of public opinion method and device |
CN107943774A (en) * | 2017-11-20 | 2018-04-20 | 北京百度网讯科技有限公司 | article generation method and device |
CN109657054A (en) * | 2018-12-13 | 2019-04-19 | 北京百度网讯科技有限公司 | Abstraction generating method, device, server and storage medium |
-
2019
- 2019-11-14 CN CN201911112545.0A patent/CN110956021A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145482A (en) * | 2017-03-28 | 2017-09-08 | 百度在线网络技术(北京)有限公司 | Article generation method and device, equipment and computer-readable recording medium based on artificial intelligence |
CN107480127A (en) * | 2017-07-17 | 2017-12-15 | 广州特道信息科技有限公司 | The analysis of public opinion method and device |
CN107943774A (en) * | 2017-11-20 | 2018-04-20 | 北京百度网讯科技有限公司 | article generation method and device |
CN109657054A (en) * | 2018-12-13 | 2019-04-19 | 北京百度网讯科技有限公司 | Abstraction generating method, device, server and storage medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579800A (en) * | 2020-08-28 | 2021-03-30 | 太极计算机股份有限公司 | Automatic identification method for original news works and first-sending media of converged media |
CN112328856A (en) * | 2020-10-30 | 2021-02-05 | 中国平安人寿保险股份有限公司 | Common event tracking method and device, computer equipment and computer readable medium |
CN112612892A (en) * | 2020-12-29 | 2021-04-06 | 达而观数据(成都)有限公司 | Special field corpus model construction method, computer equipment and storage medium |
CN112612892B (en) * | 2020-12-29 | 2022-11-01 | 达而观数据(成都)有限公司 | Special field corpus model construction method, computer equipment and storage medium |
CN113688230A (en) * | 2021-07-21 | 2021-11-23 | 武汉众智数字技术有限公司 | Text abstract generation method and system |
CN114021527A (en) * | 2021-11-04 | 2022-02-08 | 北京香侬慧语科技有限责任公司 | Long text generation method, system, medium, and device |
CN116306514A (en) * | 2023-05-22 | 2023-06-23 | 北京搜狐新媒体信息技术有限公司 | Text processing method and device, electronic equipment and storage medium |
CN116306514B (en) * | 2023-05-22 | 2023-09-08 | 北京搜狐新媒体信息技术有限公司 | Text processing method and device, electronic equipment and storage medium |
CN117473072A (en) * | 2023-12-28 | 2024-01-30 | 杭州同花顺数据开发有限公司 | Financial research report generation method, device, equipment and storage medium |
CN117473072B (en) * | 2023-12-28 | 2024-03-15 | 杭州同花顺数据开发有限公司 | Financial research report generation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110956021A (en) | Original article generation method, device, system and server | |
Chen et al. | A Two‐Step Resume Information Extraction Algorithm | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
KR101754473B1 (en) | Method and system for automatically summarizing documents to images and providing the image-based contents | |
CN111831802B (en) | Urban domain knowledge detection system and method based on LDA topic model | |
US11222053B2 (en) | Searching multilingual documents based on document structure extraction | |
Sovrano et al. | Deep learning based multi-label text classification of UNGA resolutions | |
CN103699625A (en) | Method and device for retrieving based on keyword | |
Santosh et al. | Dake: Document-level attention for keyphrase extraction | |
CN107301195A (en) | Generate disaggregated model method, device and the data handling system for searching for content | |
EP3732592A1 (en) | Intelligent routing services and systems | |
Karim et al. | A step towards information extraction: Named entity recognition in Bangla using deep learning | |
CN112000929A (en) | Cross-platform data analysis method, system, equipment and readable storage medium | |
CN110472013A (en) | A kind of hot topic update method, device and computer storage medium | |
Tang et al. | Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching | |
Ma et al. | Stream-based live public opinion monitoring approach with adaptive probabilistic topic model | |
Zhu et al. | CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism | |
Viet et al. | Analyzing recent research trends of computer science from academic open-access digital library | |
CN112765966B (en) | Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment | |
CN114255067A (en) | Data pricing method and device, electronic equipment and storage medium | |
Alami et al. | DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension | |
Shah et al. | An automatic text summarization on Naive Bayes classifier using latent semantic analysis | |
Fernandes et al. | Automated disaster news collection classification and geoparsing | |
Sweidan et al. | Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language | |
Hu et al. | Memory-enhanced latent semantic model: short text understanding for sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |