CN117726826A - News report-oriented multi-scene AI aided manuscript writing method - Google Patents

News report-oriented multi-scene AI aided manuscript writing method Download PDF

Info

Publication number
CN117726826A
CN117726826A CN202311417683.6A CN202311417683A CN117726826A CN 117726826 A CN117726826 A CN 117726826A CN 202311417683 A CN202311417683 A CN 202311417683A CN 117726826 A CN117726826 A CN 117726826A
Authority
CN
China
Prior art keywords
manuscript
news
text
user
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311417683.6A
Other languages
Chinese (zh)
Inventor
俞俊
程昱晨
匡振中
余宙
池陈宇
李朋
尤晓兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202311417683.6A priority Critical patent/CN117726826A/en
Publication of CN117726826A publication Critical patent/CN117726826A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a news report-oriented multi-scene AI auxiliary writing method, which comprises the following steps: preprocessing multisource news materials; training an entity extraction model according to the text material; constructing an event map by using the entity extraction model; selecting news report scenes, wherein the news report scenes comprise typical scenes and nonlinear complex scenes; auxiliary creation of news manuscript AI; providing a template-based rapid auxiliary manuscript forming function in a typical news report scene; a plurality of AI auxiliary manuscript-forming functions based on a large language model and a cross-modal technology are provided in a nonlinear complex news report scene. The news manuscript assists in auditing, and a manuscript auditing function based on hundred-degree AI open platform is provided. The method comprises a full-flow news manuscript auxiliary writing method for selecting materials, creating and auditing, and can greatly improve the creation efficiency of news manuscripts.

Description

News report-oriented multi-scene AI aided manuscript writing method
Technical Field
The invention relates to the technical field of natural language processing and cross-modal content generation, in particular to a multi-scene AI auxiliary writing method facing news reports.
Background
With the continuous perfection of internet infrastructure and the rapid development of computer technology, the acquisition and transmission of information are increasingly convenient, and the demands of people on the richness and timeliness of news information are also increased along with the continuous perfection, so that the demands of readers on traditional news production lines mainly based on manpower are difficult to meet.
The current production process of news manuscripts remains in the labor-intensive "hand industry" era, and the production process of news manuscripts can be roughly divided into the following steps: (1) news source acquisition. The information source is typically published information from a precursor or related institution. And (2) writing news manuscripts. The information is integrated and organized by the related word workers to form a news draft. And (3) checking news manuscripts. The manuscript sentences and the contents (such as wrongly written words, misused words, sensitive contents and the like) are checked and released by relevant professional auditors. The main drawbacks of the above procedure are as follows: (1) lack of timeliness. Conventional news media typically require some time to collect, edit, and release news. In the information age, this time-lag may lead to outdated news content, especially for those events that require real-time updates. (2) information one-sidedness. Due to limited manpower and time, traditional media may only provide partial information of an event and cannot be fully and deeply reported. This may lead to deviations in the reader's understanding of the event. (3) inefficiency. The whole process requires a large number of professional practitioners, is limited by labor cost, and is difficult to improve in news production efficiency.
In recent years, due to the rapid development of deep learning technology, some Pre-trained language models such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (generated Pre-trained Transformer) series are considered to have good performance in processing word works. Meanwhile, the rapid development of the multi-modal technology breaks the gap between different types of data (such as text, pictures and videos), for example, CLIP (Contrastive Language-Image Pretraining) and the like can link text description and image content, so that the multi-modal task is realized. These artificial intelligence models are believed to have the following advantages over humans in terms of word processing effort: (1) The efficiency is much higher than that of human beings when dealing with some simple literal tasks (e.g. classification, translation, abstract, etc.). (2) Large-scale data can be effectively processed, and valuable information can be rapidly extracted from massive information. (3) The tasks are performed very consistently and are not affected by fatigue, emotion, or other factors, thus providing high quality and consistent results. (4) AI systems can be easily extended to handle large-scale tasks, and the time and economic costs for expanding human resources are currently higher than AI systems. Thus, the deep combination of artificial intelligence technology and news report scenes is an effective scheme for overcoming the defects of the traditional media.
In the related art, the recommendation of the written material is based on text keyword extraction, the relevance between the continuity of the news event and different news subjects cannot be considered, the recommended material and the news subjects are weak in relation, the recommended material can only be interpreted as the supplement of manuscript keywords, the effective revelation is difficult to be given to authors, and the breadth and depth of the news manuscript cannot be expanded.
The related auxiliary writing technologies such as CN106650943a and the like focus on the assistance of literature creation aspects (such as word association, sentence color rendering, writing material recommendation and the like), but the AI writing assistance for news report scenes (including multi-mode news material recommendation, news event context query, news main event map query, automatic manuscript formation for short video manuscripts, automatic audit for news manuscripts and the like) has not been found yet.
Disclosure of Invention
Aiming at the problems of complex actual demands and multiple demonstration application types of news manuscript, the invention provides a multi-scene AI auxiliary manuscript writing method for news reports, which realizes highly extensible and diversified manuscript writing service.
The technical scheme of the invention is as follows:
a news report-oriented multi-scene AI aided manuscript method comprises the following steps:
s1, collecting multisource news materials, establishing a material database, and preprocessing the news materials. The multi-source news material comprises text material, picture material and video material.
S1-1, converting each picture material into an image embedded vector with a fixed length through a text encoder of the CLIP;
s1-2, extracting a key frame of a video material, and converting the key frame image into an image embedded vector with a fixed length through an image encoder of the CLIP;
s1-3, using an open source knowledge graph construction tool deep KE, taking a pre-training BERT model provided by the deep KE as a basic model, and training an entity extraction model.
Specifically, first, the text manuscript materials are subjected to data labeling to form a sufficient data set. Dividing the marked data set into a training set, a verification set and a test set; the extraction model is iteratively trained using the data set described above.
S1-4, extracting the entity relation of the news manuscript material by using the entity extraction model of S1-3 to obtain a 'subject, relation and object' triplet. And constructing an entity relation graph by taking a host and an object as nodes and a relation as edges, and storing in a hash table in the form of 'entity name-graph nodes', so as to realize rapid positioning of the graph nodes through entity names. The material library periodically updates manuscript materials and performs entity extraction and relation graph updating so as to maintain timeliness of the event map.
S1-5, extracting text material keywords by using the Jieba segmentation words in a word frequency based mode, recording the text material keywords, and establishing a corresponding relation of the keywords and the text materials.
S2, selecting a news report scene, wherein the news report scene comprises a typical scene and a nonlinear complex scene, and the typical scene comprises weather forecast, financial reports of a marketing company, a house price trend, daily foundation broadcasting, vegetable price trend, automobile quotation and sports news; the nonlinear complex scene refers to a scene except a typical scene, in which a news report manuscript format cannot be preset, such as sudden hot events of major accidents, regional conflicts and the like.
S3, auxiliary creation of news manuscript AI.
Preferably, when the typical scene is selected in the step S2, the specific method of the step S3 is as follows:
s3-1-1, selecting a specific scene by a user; the typical scenarios include weather forecast, corporate treasures on the market, house price trends, daily fund reports, vegetable price trends, car quotations, sports news.
S3-1-2, acquiring the latest relevant scene data according to the scene selected by the user S3-1-1. Specifically, a web crawler tool based on Beau full Soup is used for regularly grabbing data of a data source, and a Redis caching technology is adopted to improve the data response speed when a plurality of users request. Filling the acquired data into a preset template vacancy according to the type to form a template manuscript, wherein the preset template is in a text form;
s3-1-3, manually modifying the template manuscript by a user to obtain a manuscript;
s3-1-4, delivering the finished manuscript to AI for auditing. Specifically, the text and image auditing APIs provided by the hundred-degree AI open platform are utilized to audit the manuscript, and the modification suggestion is obtained. If the user accepts the modification suggestion, the user returns to S3-1-3.
Preferably, when the nonlinear complex scene is selected in the step S2, the specific method in the step S3 is as follows:
s3-2-1, selecting a news creation theme by the user according to the pushed hot spot and the breaking news event. Specifically, a hot news event list is obtained through a hundred-degree API real-time hot spot interface and pushed to a user.
S3-2-2, submitting corresponding prompt sentences by the user according to the selected news theme, and guiding a pre-training language model based on the open source ChatGLM2-6B to generate texts by using the prompt sentences.
S3-2-3, searching news main body association materials by using the event map, and inquiring related main bodies with high association degree with the news main bodies to assist users to expand manuscript contents.
S3-2-4, automatically abstracting, generating titles and generating labels to the text manuscript by using a pre-training ChatGLM2-6B language model to generate a news manuscript.
S3-2-5, after obtaining the news manuscript, the user selects a final manuscript forming type and carries out multimode manuscript generation. The image text manuscript or the short video manuscript can be selectively generated.
S3-2-6, the user manually modifies the image text file.
S3-2-7, if the user selects to generate the image-text manuscript in S3-2-5, the image-text manuscript can be delivered to AI for auditing. Specifically, text and image auditing APIs provided by hundred-degree AI open platform correlation are called to audit the manuscript, and auditing suggestions are obtained. If the user accepts the modification advice, the process returns to S3-2-6.
When the nonlinear complex scene is selected in the step S2, the step S3 also comprises the step S3-2-2 of automatically generating manuscripts through a language model.
Preferably, the automatic manuscript generation method comprises the following steps: the user submits corresponding prompt sentences according to the selected news theme, wherein the prompt sentences comprise key information such as specific time, place, figures, events and the like; extracting keywords from the prompt sentences by using the Jieba segmentation; and searching manuscript materials related to the keywords by utilizing the corresponding relation established in the S1-5, reconstructing a prompt sentence by combining the related manuscript materials and user input, and guiding a pre-training language model based on the open source ChatGLM2-6B to generate a text. The concrete prompt sentence construction template is as follows: according to the following article materials, a news manuscript with not less than XXX words is written by taking a prompting sentence input by a 'user' as a center, and is output in a JSON data format with a keyword as [ content ]. The "article material 1", the "article material 2", … … ".
When the nonlinear complex scene is selected in the step S2, the step S3 also comprises the step of searching by using news main body associated materials based on the event map S3-2-3.
Preferably, the searching method of the event map comprises the following steps: performing entity extraction on the user manuscript by using the extraction model trained in the step S1-3 to obtain related entities; inquiring the relation graph nodes corresponding to the related entities by using the node hash table generated by the S1-4; in the entity relation diagram generated in the step S1-4, taking the relation diagram node corresponding to the entity to be queried as a center, taking R as a radius, acquiring adjacent entity nodes, and generating an event map; after the event map is rendered by the Echarts component, the event map is presented in a visual form on a user interface, and a user can directly jump to a corresponding hundred-degree encyclopedia interface by clicking an entity node in the map; the user can reorganize the prompt sentences according to the relation between the news main body and the related entities, and the manuscript is regenerated in a S3-2-2 mode.
When the nonlinear complex scene is selected in the step S2, the step S3 further comprises the steps of S3-2-4 automatically abstracting, title generating and label generating the text manuscript through the language model.
Preferably, the method for automatically abstracting is as follows: constructing a prompt sentence to guide the pre-training ChatGLM2-6B language model to output a abstract. Specifically, the prompt statement format is: the following contents are summarized and output in JSON data format with keyword [ summary ]. Content of the "concrete article".
Preferably, the title generation method comprises the following steps: constructing a prompt sentence guides the pre-trained ChatGLM2-6B language model to output a title. Specifically, the prompt statement format is: the following article contents are analyzed, X to X titles are taken for the article, the number of the title words is between X and X, and the title words are output in a JSON data format with keywords of title. Content of the "concrete article".
Preferably, the label generation method comprises the following steps: constructing a prompt sentence to guide the pre-training ChatGLM2-6B language model to output a label. Specifically, the prompt statement format is: the following article contents are analyzed, X to X keywords are summarized, and the keywords are output in a JSON data format of [ tag ]. Content of the "concrete article".
When the nonlinear complex scene is selected in the step S2, the step S3 also comprises the step S3-2-5 of generating the image-text manuscript and the short video manuscript through AI-aided creation.
Preferably, the method for generating the image text manuscript comprises the following steps: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and comparing the text embedded vector with the image embedded vector in S1-1 by using a pre-training CLIP model, finding out an image corresponding to the image embedded vector with the highest matching degree, and placing the image behind a corresponding text segment.
Preferably, the method for generating the short video manuscript comprises the following steps: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and comparing the Text embedding vector with the key frame image embedding vector in the step S1-2 by using a pre-training CLIP model, finding out the key frame image with the highest matching degree, cutting and intercepting video fragments nearby the frame, converting the corresponding segmented Text into audio by using a TTS (Text-to-Speech) model, embedding the audio into the video fragments, and splicing all the video fragments generated by the Text fragments in sequence to obtain the final short video manuscript.
The invention has the following characteristics and beneficial effects:
by adopting the technical scheme, the invention combines the artificial intelligence technology with the news draft forming flow, under the addition of the natural language processing multi-mode technology, compared with the traditional news draft manual editing mode, the draft forming efficiency can be greatly improved by using the method, and the draft forming threshold of non-professional media such as self media and the like is greatly reduced. The template manuscript-forming function of the typical scene enables the daily news manuscript forming of the scene to be fast and automatic. The news event map and the context are applied, so that the writing material inquiry is more convenient, and the material is closer to a news main body. The automatic text abstract and label generating function avoids the repeated labor of editing complexity. Intelligent image-text manuscript-forming shortens the lengthy news picture material searching process, and the short video manuscript-forming function can enable a user without video editing skills to generate short video manuscripts by one key.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a general framework of the method of the present invention.
Fig. 2 is a news draft flowchart of a typical scenario (embodiment 1).
Fig. 3 is a news draft flow chart of a nonlinear complex scene (embodiment 2).
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art in a specific case.
Example 1
In this embodiment, a news report-oriented multi-scenario AI-aided manuscript method is provided, as shown in fig. 1 and fig. 2, including the following steps:
s1, carrying out rapid creation of template manuscripts in a typical scene.
S1-1, selecting a specific scene by a user; the typical scenarios include weather forecast, corporate treasures on the market, house price trends, daily fund reports, vegetable price trends, car quotations, sports news.
S1-2, acquiring the latest relevant scene data according to the scene selected by the user S1-1. Specifically, a web crawler tool based on Beau full Soup is used for regularly grabbing data of a data source, and a Redis caching technology is adopted to improve the data response speed when a plurality of users request. Filling the obtained data into a preset template vacancy according to the type to form a template manuscript.
A template is a pre-written paragraph, and specific data can be inserted in the middle of the template.
For example, a typical preset template is: "X city X district, new house average price X element in this month, second house average price X element, rising/falling/leveling up in the last month, rising/falling by X%. And after the user selects a specific urban area, the house price data is grabbed, and the templates can be filled to obtain template manuscripts.
S1-3, manually modifying the template manuscript by a user to obtain a manuscript;
s1-4, delivering the finished manuscript to AI for auditing. Specifically, the text and image auditing APIs provided by the hundred-degree AI open platform are utilized to audit the manuscript, and the modification suggestion is obtained. If the user accepts the modification advice, the process returns to S1-3.
Example 2
In this embodiment, a news report-oriented multi-scenario AI-aided manuscript method is provided, as shown in fig. 1 and 3, including the following steps:
s1, collecting multisource news materials, establishing a material database, and preprocessing the news materials. The multi-source news material comprises text material, picture material and video material.
S1-1, converting each picture material into an image embedded vector with a fixed length through a text encoder of the CLIP;
s1-2, extracting a key frame of a video material, and converting the key frame image into an image embedded vector with a fixed length through an image encoder of the CLIP;
s1-3, using an open source knowledge graph construction tool deep KE, taking a pre-training BERT model provided by the deep KE as a basic model, and training a relation extraction model.
Specifically, first, the text manuscript materials are subjected to data labeling to form a sufficient data set. Dividing the marked data set into a training set, a verification set and a test set; the extraction model is iteratively trained using the data set described above.
For example, a typical piece of annotation data:
{ "Sentence": "lift the beauty of Hangzhou, the western lake always being the first word to reflect the brain. "Relation" in city "," Head "in" West lake "," head_offset "8" Tail "in Hangzhou" Tail_offset "2 }
Dividing the marked data set into a training set, a verification set and a test set; the extraction model is iteratively trained using the data set described above.
S1-4, extracting the entity relation of the news manuscript material by using the entity extraction model of S1-3 to obtain a 'subject, relation and object' triplet. And constructing an entity relation graph by taking a host and an object as nodes and a relation as edges, and storing in a hash table in the form of 'entity name-graph nodes', so as to realize rapid positioning of the graph nodes through entity names. The material library periodically updates manuscript materials and performs entity extraction and relation graph updating so as to maintain timeliness of the event map.
S1-5, extracting text material keywords by using the Jieba segmentation words in a word frequency based mode, recording the text material keywords, and establishing a corresponding relation of the keywords and the text materials.
S2, under a nonlinear complex scene, news manuscript creation is carried out based on AI assistance.
S2-1, providing a hot news event list for a user. Specifically, a hot news event list is obtained through a hundred-degree API real-time hot spot interface in a preset time period, and is cached through Redis, and when a user requests a hot news event, the event list is returned.
S2-2, submitting corresponding prompt sentences by the user according to the selected news theme, and guiding a pre-training language model based on the open source ChatGLM2-6B to generate texts by using the prompt sentences.
S2-3, searching news main body association materials by using the event map, and inquiring related main bodies with high association degree with the news main bodies to assist users to expand manuscript contents.
S2-4, automatically abstracting, generating titles and generating labels to the text manuscript by using a pre-training ChatGLM2-6B language model to generate a news manuscript.
S2-5, after the news manuscript is obtained, the user selects a final manuscript forming type, and an image-text manuscript or a short video manuscript can be selectively generated.
S2-6, if the image-text manuscript is selected to be generated in S2-5, the user manually modifies the image-text manuscript.
S2-7, if the user selects to generate the image-text manuscript in S2-5, the image-text manuscript can be delivered to AI for auditing. Specifically, text and image auditing APIs provided by hundred-degree AI open platform correlation are called to audit the manuscript, and auditing suggestions are obtained. If the user accepts the modification advice, the process returns to S2-6.
The step S2 comprises the step S2-2 of automatically generating manuscripts through a language model.
Preferably, the method for automatically generating manuscripts through the language model comprises the following steps: the user submits corresponding prompt sentences according to the selected news theme, wherein the prompt sentences comprise key information such as specific time, place, figures, events and the like;
extracting keywords from the prompt sentences by using the Jieba segmentation; and searching manuscript materials related to the keywords by utilizing the corresponding relation established in the S1-5.
And reconstructing a prompt sentence by combining the related news manuscript materials and user input, and guiding a pre-training language model based on the open source ChatGLM2-6B to generate a text. The concrete prompt sentence construction template is as follows: according to the following article materials, a news manuscript with not less than XXX words is written by taking a prompting sentence input by a 'user' as a center, and is output in a JSON data format with a keyword as [ content ]. The "article material 1", the "article material 2", … … ".
The step S2 comprises the step of S2-3 of searching news main body associated materials based on event patterns.
Preferably, the searching method of the event map comprises the following steps: performing entity extraction on the user manuscript by using the extraction model trained in the step S1-3 to obtain related entities; inquiring the relation graph nodes corresponding to the related entities by using the node hash table generated by the S1-4; in the entity relation diagram generated in the step S1-4, taking the relation diagram node corresponding to the entity to be queried as a center, taking R as a radius, acquiring adjacent entity nodes, and generating an event map; after the event map is rendered by the Echarts component, the event map is presented in a visual form on a user interface, and a user can directly jump to a corresponding hundred-degree encyclopedia interface by clicking an entity node in the map; the user may reorganize the hint statements based on the relationship of the news body to the related entities to regenerate the manuscript in the manner of S2-2.
S2-4 carries out automatic abstract, title generation and label generation on the text manuscript through a language model.
Preferably, the method for automatically abstracting is as follows: constructing a prompt sentence to guide the pre-training ChatGLM2-6B language model to output a abstract. Specifically, the prompt statement format is: the following contents are summarized and output in JSON data format with keyword [ summary ]. Content of the "concrete article".
Preferably, the title generation method comprises the following steps: constructing a prompt sentence guides the pre-trained ChatGLM2-6B language model to output a title. Specifically, the prompt statement format is: the following article contents are analyzed, X to X titles are taken for the article, the number of the title words is between X and X, and the title words are output in a JSON data format with keywords of title. Content of the "concrete article".
Preferably, the label generation method comprises the following steps: constructing a prompt sentence to guide the pre-training ChatGLM2-6B language model to output a label. Specifically, the prompt statement format is: the following article contents are analyzed, X to X keywords are summarized, and the keywords are output in a JSON data format of [ tag ]. Content of the "concrete article".
S2-5 creates the image text manuscript and the short video manuscript through auxiliary creation.
Preferably, the method for generating the image text manuscript comprises the following steps: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and comparing the text embedded vector with the image embedded vector in S1-1 by using a pre-training CLIP model, finding out an image corresponding to the image embedded vector with the highest matching degree, and placing the image behind a corresponding text segment.
The short video manuscript generation method comprises the following steps: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and comparing the Text embedding vector with the key frame image embedding vector in the step S1-2 by using a pre-training CLIP model, finding out the key frame image with the highest matching degree, cutting and intercepting video fragments nearby the frame, converting the corresponding segmented Text into audio by using a TTS (Text-to-Speech) model, embedding the audio into the video fragments, and splicing all the video fragments generated by the Text fragments in sequence to obtain the final short video manuscript.
The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments, including the components, without departing from the principles and spirit of the invention, yet fall within the scope of the invention.

Claims (10)

1. A news report-oriented multi-scene AI aided manuscript method is characterized by comprising the following steps:
s1, collecting multi-source news materials, establishing a material database, and preprocessing the multi-source news materials, wherein the multi-source news materials comprise text materials, picture materials and video materials;
s1-1, converting each picture material into an image embedded vector with a fixed length through a text encoder of the CLIP;
s1-2, extracting a key frame of a video material, and converting the key frame image into an image embedded vector with a fixed length through an image encoder of the CLIP;
s1-3, using an open source knowledge graph construction tool deep KE, taking a pre-training BERT model provided by the deep KE as a basic model, and training a relation extraction model;
the training method comprises the following steps: firstly, carrying out data annotation on text materials through a relation extraction model to form a sufficient data set, and dividing the annotated data set into a training set, a verification set and a test set; iterative training of a relationship extraction model using the dataset;
s1-4, performing entity relation extraction on news manuscript materials by using a relation extraction model of S1-3 to obtain triples, wherein the triples comprise a main body, a relation and an object, the main body and the object are taken as nodes, the relation is taken as an edge, an entity relation diagram is constructed, the entity relation diagram is stored in a hash table in the form of an entity name-diagram node for storage, so that the bitmap node is rapidly positioned through the entity name, and a material library periodically updates manuscript materials and performs entity extraction and relation diagram update;
s1-5, extracting text material keywords by using the Jieba segmentation in a word frequency based mode, recording the text material keywords, and establishing a corresponding relation of the keywords and the text materials;
s2, selecting a news report scene, wherein the news report scene comprises a typical scene and a nonlinear complex scene, and the typical scene comprises weather forecast, financial reports of a marketing company, a house price trend, daily foundation broadcasting, vegetable price trend, automobile quotation and sports news; the nonlinear complex scene refers to a scene except a typical scene, in which a news report manuscript format cannot be preset, such as sudden hot events of major accidents, regional conflicts and the like;
s3, auxiliary creation of news manuscript AI.
2. The news report-oriented multi-scenario AI-assisted writing method of claim 1, wherein, when a typical scenario is selected in step S2, the specific method of step S3 is as follows:
s3-1-1, selecting a specific scene by a user;
s3-1-2, according to the scene selected by the user S3-1-1, periodically performing data capture on a data source by using a web crawler tool based on BeautifluSOup, improving the data response speed when a plurality of users request by adopting a Redis caching technology, filling the obtained data into a preset template vacancy according to types to form a template manuscript, wherein the preset template is in a text form;
s3-1-3, manually modifying the template manuscript by a user to obtain a manuscript;
s3-1-4, checking the manuscript submitted by AI, checking the manuscript by using a text and image checking API provided by the hundred-degree AI open platform to obtain a modification suggestion, and returning to S3-1-3 if the user accepts the modification suggestion.
3. The news report-oriented multi-scenario AI-aided manuscript method of claim 1, wherein when the nonlinear complex scenario selected in step S2 is selected, the specific method of step S3 is as follows:
s3-2-1, acquiring a hot news event list through a hundred-degree API real-time hot spot interface, pushing the hot news event list to a user, and selecting a news creation theme according to the pushed hot spots and burst news events by the user;
s3-2-2, submitting corresponding prompt sentences by a user according to the selected news theme, and guiding a pre-training language model based on the open source ChatGLM2-6B to generate text by using the prompt sentences;
s3-2-3, searching news main body related materials by using an event map so as to assist a user to expand manuscript contents;
s3-2-4, automatically abstracting, generating titles and generating labels on text manuscripts by using a pre-training ChatGLM2-6B language model to obtain news manuscripts;
s3-2-5, after obtaining a news manuscript, selecting a final manuscript forming type by a user, and generating multimode manuscripts, wherein the manuscript forming type comprises an image-text manuscript and a short video manuscript;
s3-2-6, if the user selects to generate the image-text manuscript in S3-2-5, the image-text manuscript can be manually modified;
s3-2-7, if the user selects to generate the image-text manuscript in S3-2-5, the image-text manuscript can be submitted to AI audit, text and image audit APIs provided by a hundred-degree AI open platform are called to audit the manuscript, audit suggestions are obtained, and if the user accepts the modification suggestions, the user returns to S3-2-6.
4. The news report-oriented multi-scenario AI-assisted writing method of claim 3, wherein in the step S3-2-2, the specific text generation steps are as follows: the user submits corresponding prompt sentences, wherein the prompt sentences comprise specific time, place, people and events; extracting keywords from the prompt sentences by using the Jieba segmentation; searching related manuscript materials by utilizing the text material keyword records generated in the S1-5, reconstructing a prompt sentence by combining the manuscript materials with user input, guiding a pre-training language model based on the open source ChatGLM2-6B to generate a text, wherein the prompt sentence construction template is as follows: according to the following article materials, a news manuscript with not less than XXX words is written by taking a prompting sentence input by a ' user ' as a center, and is output in a JSON data format with a keyword as [ content ], wherein the following article materials are the ' article material 1 ', the ' article material 2 ', and the … … '.
5. The method for assisting writing in a multi-scenario AI for news stories according to claim 3, wherein in the step S3-2-3, the specific steps of searching for the news main body associated material for assisting writing by using an event map are as follows: performing entity extraction on the user manuscript by using the relation extraction model trained by the S1-3 to obtain related entities; inquiring the relation graph nodes corresponding to the related entities by using the node hash table generated by the S1-4; in the entity relation diagram generated in the step S1-4, taking the relation diagram node corresponding to the entity to be queried as a center, taking R as a radius, acquiring adjacent entity nodes, and generating an event map; after the event map is rendered by the Echarts component, the event map is presented in a visual form on a user interface, and a user can directly jump to a corresponding hundred-degree encyclopedia interface by clicking an entity node in the map; the user can reorganize the prompt sentences according to the relation between the news main body and the related entities, and the manuscript is regenerated in a S3-2-2 mode.
6. The news report-oriented multi-scenario AI-aided manuscript method of claim 3, wherein in said step S3-2-4, the automatic summarization method is as follows: constructing a prompt sentence guiding pre-trained ChatGLM2-6B language model, outputting a abstract according to the article content, wherein the prompt sentence format is as follows: the following article contents are summarized and output in a JSON data format with the keyword of [ summary ], and the specific article contents are 'provided'.
7. The news report-oriented multi-scenario AI-assisted writing method of claim 3, wherein in step S3-2-4, the title generation method is as follows: constructing a prompt sentence guiding pre-trained ChatGLM2-6B language model, generating a title according to the article content, wherein the prompt sentence format is as follows: the following article contents are analyzed, X to X titles are taken for the article, the number of the title words is between X and X, and the title words are output in a JSON data format with the keyword of title, and the specific article contents are.
8. The news report-oriented multi-scenario AI-assisted writing method of claim 3, wherein in step S3-2-4, the label generation method is as follows: constructing a prompt sentence guiding pre-trained ChatGLM2-6B language model, outputting labels according to article contents, wherein the prompt sentence format is as follows: the following article contents are analyzed, X to X keywords are summarized, and the specific article contents' are output in a JSON data format with the keywords of [ tag ].
9. The news report-oriented multi-scenario AI-assisted writing method of claim 3, wherein the method for generating the image-text manuscript in step S3-2-5 is as follows: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and comparing the text embedded vector with the image embedded vector in S1-1 by using a pre-training CLIP model, finding out an image corresponding to the image embedded vector with the highest matching degree, and placing the image behind a corresponding text segment.
10. The news report-oriented multi-scenario AI-assisted writing method of claim 3, wherein the method for generating short video manuscripts in step S3-2-5 is as follows: splitting the first draft into fixed-length paragraphs, and converting each paragraph into a fixed-length text embedded vector through a text encoder of the CLIP; and (3) comparing the text embedding vector with the key frame image embedding vector in the step (S1-2) by using a pre-training CLIP model, finding out the key frame image with the highest matching degree, cutting and intercepting video fragments near the frame, converting the corresponding segmented text into audio by using the TTS model, embedding the audio into the video fragments, and splicing all the video fragments generated by the text fragments in sequence to obtain the final short video manuscript.
CN202311417683.6A 2023-10-30 2023-10-30 News report-oriented multi-scene AI aided manuscript writing method Pending CN117726826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311417683.6A CN117726826A (en) 2023-10-30 2023-10-30 News report-oriented multi-scene AI aided manuscript writing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311417683.6A CN117726826A (en) 2023-10-30 2023-10-30 News report-oriented multi-scene AI aided manuscript writing method

Publications (1)

Publication Number Publication Date
CN117726826A true CN117726826A (en) 2024-03-19

Family

ID=90204087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311417683.6A Pending CN117726826A (en) 2023-10-30 2023-10-30 News report-oriented multi-scene AI aided manuscript writing method

Country Status (1)

Country Link
CN (1) CN117726826A (en)

Similar Documents

Publication Publication Date Title
CN110489395A (en) Automatically the method for multi-source heterogeneous data knowledge is obtained
Crasborn et al. Sharing sign language data online: Experiences from the ECHO project
Costa et al. A comparative user evaluation of terminology management tools for interpreters
Agosti et al. A historical and contemporary study on annotations to derive key features for systems design
CN107767871A (en) Text display method, terminal and server
Good Data and language documentation
CN110516203B (en) Dispute focus analysis method, device, electronic equipment and computer-readable medium
CN109033282A (en) A kind of Web page text extracting method and device based on extraction template
Spiliotopoulos et al. Auditory universal accessibility of data tables using naturally derived prosody specification
Knight et al. HeadTalk, HandTalk and the corpus: Towards a framework for multi-modal, multi-media corpus development
CN110889266A (en) Conference record integration method and device
CN104516865B (en) Association desktop based on Web demonstrates the online presentation file edit methods of subdocument
CN117436417A (en) Presentation generation method and device, electronic equipment and storage medium
CN111930289B (en) Method and system for processing pictures and texts
JP2019220098A (en) Moving image editing server and program
CN117436414A (en) Presentation generation method and device, electronic equipment and storage medium
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
Liang et al. Task design and assignment of full-text generation on mass chinese historical archives in digital humanities: a crowdsourcing approach
Pincemin et al. Textometry on audiovisual corpora
CN117726826A (en) News report-oriented multi-scene AI aided manuscript writing method
CN116129868A (en) Method and system for generating structured photo
US20230252700A1 (en) System for ontological graph creation via a user interface
EP3365814B1 (en) Computer-implemented method for the generation of zoomable hierarchical texts starting from an original electronic text
CN104866607B (en) A kind of Dongba character textual research and explain database building method
CN111309867B (en) Knowledge base dynamic updating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination