CN116579308B

CN116579308B - Presentation generation method and device

Info

Publication number: CN116579308B
Application number: CN202310819781.6A
Authority: CN
Inventors: 张丽颖; 费军波; 张云云; 张莹; 程稳; 李勇; 陈�光; 曾令仿
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-10-10
Anticipated expiration: 2043-07-06
Also published as: CN116579308A

Abstract

The application discloses a presentation generating method and device, wherein the method comprises the following steps: acquiring a theme for generating a presentation, and acquiring a secondary title of the presentation and text contents under each secondary title based on a text generation module which is constructed and trained in advance; structuring the theme, the secondary title and the text content under each secondary title of the presentation to obtain a plurality of parts, taking each part as one presentation page, and extracting keywords from other pages except the first page and the catalog page; generating a picture matching image corresponding to each page of presentation manuscript through a text generation image module based on the extracted keywords; and automatically typesetting the divided text content and the map matching image of the corresponding page to obtain a complete presentation.

Description

Presentation generation method and device

Technical Field

The application belongs to the fields of natural language processing and multi-modal, and particularly relates to a method and a device for generating a presentation.

Background

With the development of artificial intelligence, more and more fields begin to use AI technology to improve work efficiency and reduce labor costs. The presentation is a common business tool, and is widely applied to occasions such as demonstration, conference, report and the like, but the presentation is required to be made with a great deal of time and effort, and the presentation is also required to be designed and typeset. In response to this problem, it has become a trend to generate presentations using AI technology.

The technical core of AI generating the presentation is to learn and simulate the design thinking and typesetting skills of human beings by using a deep learning algorithm, thereby realizing the purpose of automatically generating the presentation. Specifically, this technique generally includes the following aspects: natural language processing, namely analyzing and processing an input theme text, and generating text contents of a presentation according to the theme contents; generating a text image, and generating a presentation file map based on the text; and (3) automatic typesetting, and typesetting and layout of the presentation file are performed based on the generated text and image. The generation effect of the presentation is good, the text content of the accurate presentation can be generated mainly by the natural language processing part, and the text generated image part can accurately generate the presentation matching chart.

Recently, chat robots ChatGPT have generated a wide variety of applications based on published API interfaces. The application ChatBG for automatically making the presentation is realized by two students through one research and development of ChatGPT and DALL-E2. Because the technical details disclosed by the ChatCG are very few, according to the brief introduction of the ChatCG and the calculation example displayed by the official network, the ChatCG is supposed to mainly comprise three tasks, namely a text generation task and a text required by a presentation file generated by using the ChatGPT; secondly, generating a picture, namely generating a configuration picture by using DALL-E2, and thirdly, designing a presentation layout and a theme; the chatGPT and the DALL-E2 are respectively a chat robot and a text generation image large model, wherein the chatGPT has strong text generation capability, the DALL-E2 can generate corresponding pictures according to text description, the training results of the large model are completely relied on, and for most researchers, the large model is difficult to train, so that the large model is completely trained into a black box, and continuous modification and optimization of the inside of the model cannot be performed.

In addition, foreign countries have less disclosure of large model training details similar to ChatGPT, and their APIs are not published for domestic. Thus, for this case, the interface cannot be directly invoked to generate the presentation text. If the selection of retraining text-generating large models and text-generating image large models is made, two main problems are faced: 1. the Chinese presentation data sets are rarely disclosed, a large number of presentation data sets need to be collected again, and a large amount of data preprocessing work is faced, so that time and labor cost are wasted. 2. The training details of generating the presentation are less disclosed, and although the whole link mode is fixed, how to enable the text generation model to generate the text content of the presentation, and how to generate the corresponding technical details of the presentation map based on the text content are not described in detail.

Disclosure of Invention

Aiming at the defects of the prior art, the embodiment of the application aims to provide a presentation generation method and device, aiming at generating a presentation by using an AIGC technology, thereby greatly improving the efficiency of making the presentation in work and reducing the time cost.

According to a first aspect of an embodiment of the present application, there is provided a presentation generating method, including:

acquiring a theme for generating a presentation, and acquiring a secondary title of the presentation and text contents under each secondary title based on a text generation module which is constructed and trained in advance;

structuring the theme, the secondary title and the text content under each secondary title of the presentation to obtain a plurality of parts, taking each part as one presentation page, and extracting keywords from other pages except the first page and the catalog page;

generating a picture matching image corresponding to each page of presentation manuscript through a text generation image module based on the extracted keywords;

and automatically typesetting the divided text content and the map matching image of the corresponding page to obtain a complete presentation.

Further, the text generation module comprises a title generation model and a content generation model, wherein the title generation model is used for generating secondary titles of the presentation according to the theme of the presentation, and the content generation model is used for generating corresponding text content under each secondary title according to the secondary titles.

Further, according to the actual situation of the user, one of the following modes is selected to obtain the secondary title of the presentation file:

(1) Directly referencing a self-set presentation secondary title provided by a user;

(2) Based on a preset presentation template, directly adding the theme into each level of titles;

(3) And acquiring a theme for generating the presentation, inputting the theme into a title generation model, and generating a secondary title of the presentation.

Further, the training of the title generation model comprises a pre-training stage and a fine tuning stage:

the loss function of the pre-training stage is

Wherein the method comprises the steps ofFor the size of the contextual window, +.>For conditional probability +.>Is a superparameter of the model,/->Is an unsupervised corpus;

the loss function of the fine tuning stage is that

Wherein the method comprises the steps of，/>Is a Chinese sample with label, consisting of ∈>And tag->Composition, x ⁱ For Chinese words in C, m is the number of Chinese words in C, < >>Is a weight parameter.

Further, the content generation model is obtained by fine tuning the title generation model after the pre-training stage.

Further, structuring the theme, the secondary title and the text content under each secondary title of the presentation to obtain a plurality of parts, taking each part as one page of the presentation, and extracting keywords from pages except the first page and the catalog page, including:

taking the theme of the presentation as the title of the presentation, arranging each secondary title in sequence as a catalog of the presentation, taking the generated secondary title as the title of each page of the presentation, and taking the text content under each secondary title as the content generated based on the title of the corresponding page;

and constructing a keyword extraction module, and extracting keywords from the title of each page of the presentation and the content generated based on the title.

Further, for the title of each page of the presentation and the content generated based on the title, calculating the word frequency-reverse file frequency value of each word, and taking a plurality of words with the maximum word frequency-reverse file frequency value as keywords.

According to a second aspect of an embodiment of the present application, there is provided a presentation generating apparatus, including:

the acquisition module is used for acquiring the theme of the generated presentation, and acquiring all text contents of the presentation based on the text generation module which is constructed and trained in advance;

the keyword extraction module is used for structuring the theme, the secondary title and the text content under each secondary title of the presentation to obtain a plurality of parts, taking each part as one page of the presentation, and extracting keywords from other pages except the first page and the catalog page;

the image generation module is used for generating a picture matching image corresponding to each page of presentation through the text generation image module based on the extracted keywords;

and the typesetting module is used for automatically typesetting the divided text content and the matched image of the corresponding page to obtain a complete presentation.

According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.

According to a fourth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

as can be seen from the above embodiments, the present application provides a method and an apparatus for generating a presentation based on AIGC (Artificial Intelligence Generated Content), develops a new scheme based on an existing link mode, provides overall link mode details, can generate an overall presentation based on a theme, can realize efficient, accurate and personalized presentation production, not only can save time and effort for a user, but also can greatly improve quality and effect of the presentation, and allows the user to better show his own ideas and ideas; the method can form good floor application for the combination of a text generation task in the natural language processing field and a text generation image task in the multi-mode field, and has important theoretical significance and practical value for the rapid development of the two fields.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is an overall technical roadmap showing a presentation generating method according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a presentation generation method according to an exemplary embodiment.

Fig. 3 is a block diagram illustrating a presentation generating apparatus according to an exemplary embodiment.

Fig. 4 is a schematic diagram of an electronic device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The technical terms of the present application are explained and explained below:

text generation pre-training model: the mainstream solution for text generation pre-training models is the Autoregressive (AR) model. To enhance the text generation effect, AR improvement is divided into 2 directions: (1) GPT series models, the GPT series models increase corpus by improving parameters, so that the effect is obviously improved, but the cost is really low. (2) The context can be seen when text is generated by autoregressive by fusing an Autorecoder (AE) technique on the basis of AR. AE and AR fusion is divided into 2 parts, the first part based on improvement under the BERT framework and the second part based on improvement under the Seq2Seq framework. In summary, the fusion of AR and AE is more consistent with human language habits, and the seemingly verbatim (left to right, context dependent) is actually verbatim after the overall concept and organization (left to right, context dependent).

Text-generating image model: a text-generated image model is a model for converting a given text description into a corresponding image. These models are typically based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and use Generative Antagonism Networks (GAN) or variational self-encoders (VAEs) or the like to generate the models. The principle of operation is to encode the text description as a vector or matrix and input it into an image generator to generate the corresponding image. Typically, this process is split into two parts: the first part is the text-to-image conversion and the second part is the optimization of the generated images to ensure that they conform to a given text description. The GAN model trains a generator network and a discriminator network, so that the generator can generate images which are as close to the real images as possible, and the discriminator judges whether the images generated by the generator are similar to the real images or not. The VAE model then converts the input text description into vectors in potential space using an encoder, and then decodes the potential space vectors into corresponding images using a decoder.

AIGC: AIGC (Artificial Intelligence Generated Content) refers to the use of artificial intelligence techniques to generate various forms of content, such as articles, pictures, video, audio, etc. The AIGC technique utilizes artificial intelligence algorithms for data analysis, learning and generation, thereby enabling rapid, automatic, efficient generation of large amounts of digital content.

The application provides a presentation generating method, as shown in fig. 1 and 2, which is applied to a terminal and can comprise the following steps:

s11: acquiring a theme for generating a presentation, and acquiring all text contents of the presentation based on a text generation module which is constructed and trained in advance;

s12: structuring the theme, the secondary title and the text content under each secondary title of the presentation to obtain a plurality of parts, taking each part as one presentation page, and extracting keywords from other pages except the first page and the catalog page;

s13: generating a picture matching image corresponding to each page of presentation manuscript through a text generation image module based on the extracted keywords;

s14: and automatically typesetting the divided text content and the map matching image of the corresponding page to obtain a complete presentation.

The following description is in connection with the theme of "safe use of electricity".

In the specific implementation of S11, obtaining a theme for generating a presentation, and obtaining all text contents of the presentation based on a text generation module which is built and trained in advance;

specifically, the text generation module includes a title generation model and a content generation model.

The title generation module is used for generating a secondary title of the presentation according to the theme of the presentation, wherein the secondary title is divided into three cases, (1) if a user can provide a self-set secondary title of the presentation, the secondary title is directly referred; (2) If the user does not provide the self-set second-level title of the presentation and is satisfied with the preset second-level title of the presentation, the user can select to directly add the theme to each level title based on the preset presentation template; it should be noted that, the preset title format is relatively fixed, and is suitable for relatively fixed fields, such as paper answer, and the basic title is "theme background, main innovation point, hope and summary", etc.; (3) If the user does not provide the self-set second-level title of the presentation and is not satisfied with the preset second-level title of the presentation, a title generation model is constructed, a theme for generating the presentation is acquired, the theme is input into the title generation model, the second-level title of the presentation is generated, and it is required to explain that the title generated by the model is relatively open and is suitable for more general situations.

Specifically, in the case of (1), the titles of the presentation stages input by the user are directly acquired, and no processing is required. In this embodiment, based on the theme of "safe electricity usage", four secondary titles of "importance of safe electricity usage", "classification of electric shock accidents", "preventive measures of safe electricity usage", "eight principles of safe electricity usage" are preset.

For case (2), in one embodiment, the preset presentation templates include:

template one: science popularization propaganda comprises basic introduction, development trend, superiority and inferiority, key significance and cases of topics;

and (2) a template II: a report of scientific research, comprising a research background of a subject, a research scheme of the subject, a research result of the subject and a research summary of the subject;

and (3) a template III: marketing, comprising a background overview of the subject, a current marketing analysis of the subject, a marketing plan of the subject, and an expected marketing effect of the subject.

In specific implementation, a presentation template can be newly added according to actual conditions.

In this embodiment, a template-science popularization propaganda is selected, and the titles of "basic introduction of safe electricity utilization", "development trend of safe electricity utilization principle", "importance of safe electricity utilization", "critical meaning of safe electricity utilization" and "case of safe electricity utilization" are constructed.

For case (3), a title generation model may be designed, a text generation Pre-training model may be selected, and in an embodiment, a GPT (generating Pre-Trained Transformer) model may be selected, and Pre-training and fine-tuning may be performed. In the pre-training part of the modelRepresenting each token (referring to a unit obtained by word segmentation of a text sample, in the present application, a Chinese word), when the window size is set toThe first in the predictive sentenceWhen the words are used, thenBefore the individual wordIndividual words, also based on super parametersTo predict the firstWhat the individual word may be. In summary, the preceding word is used to predict the following word. A language model GPT is constructed, and the loss function is defined as follows:

the language model uses a multi-layer transducer decoder, the model applies a multi-headed attention mechanism to the input context labels, and then connects the location feed forward network such that an output distribution is produced on the target token:

wherein the method comprises the steps ofIs a token context vector, < +.>Is the number of layers of the transducer decoder, < >>Is an embedding matrix of token, +.>Is a location embedding matrix.

The data set at this time can be processed into a format of { "text": "topic\n\n topic" }, using a large number of Chinese data sets (which can be obtained by self-gathering data to construct or download an open source data set). And then input into the GPT model for full pre-training, and stopping the pre-training of the model until the model loss curve converges.

Next, based on the fully trained pre-trained model, a supervised fine tuning of the model is performed, assuming a tagged piece of dataWherein each sample is composed ofAnd labelsComposition, where x is the Chinese word in a Chinese sample. Activation values of a transducer block obtained by extensive training of a pre-training modelInput it into a parameter asPrediction in full link layer of (c):

Thus, the maximized objective function is:

bonding ofAnd->Adding weight parameter->Controlling the proportion to obtain a final objective function in the fine tuning stage:

the fine tuning section requires special preprocessing of the data set, and the final processing structure is { "theme": "title 1" \n\n "title 2" \n "title 3" … "title n" }, wherein title n represents the title of the nth section of the presentation.

And after the model training of the fine tuning part is converged, reasoning is carried out based on the model, a text theme is obtained, the text theme is input into the fine tuning model with sufficient training, and finally, the obtained output is the title text of the presentation.

In this embodiment, the pre-training and fine tuning is performed based on the GPT model. At this point, the model size was chosen to be 15 hundred million parameters, using a Megatron-LM implementation of the distributed training model. The data set can be Chinese data set such as WebQA, WIKI, XQuAD, duReader Dataset, etc., and the size of the data set is above 20G. Each sample is then processed into the format of { "text": "topic \n\n topic" }. Then input into GPT model to make full pre-training, and stopping model pre-training until loss curve is converged below 1. Next, based on the fully trained pre-trained model, a supervised fine tuning of the model is performed, with the fine tuning section also selecting the implementation of Megatron-LM, except that the dataset is treated as { "topic": "Title 1" \n "\Title 2" \n "" Title 3"…" }, where Title n represents the title of the nth portion of the presentation. After the model loss curve of the fine tuning part is converged, the 'safe electricity consumption' theme is input into the title fine tuning module, and finally, the title text of the demonstration file is output as 'important meaning of safe electricity consumption', 'cases and classification of electric shock accidents', 'effective preventive measures of safe electricity consumption', 'propaganda of safe electricity consumption'.

In a specific implementation, for the title generation model, language big models such as LLaMA, chatGLM, bert, fireFly and the like can be selected, the pre-training and fine-tuning modes of different open source big models can be different, and different fine-tuning and pre-training modes can be selected according to corresponding tasks. For example, for ChatGLM, the mode of fine Tuning of the model mentioned by the Github-THUDM/ChatGLM-6B: chatGLM-6B: an Open Bilingual Dialogue Language Model |open source bilingual dialogue language model in the open source project can be used as a reference, and the ChatGLM model is based on fine Tuning of P-Tuning v 2. The P-Tuning v2 reduces the parameter quantity to be fine-tuned to 0.1% of the original parameter quantity, and then fine-tunes the model by methods such as model quantization, video memory optimization and the like. If fine tuning of the full parameters is required, a deep speed tool is installed to make fine tuning of the model.

And the content generation model is used for generating a corresponding content text under each secondary title according to the secondary titles.

Specifically, further fine-tuning is performed based on models in the headline generation model that have been pre-trained sufficiently. The trimming step is consistent with the trimming scheme in the title generation module, but the data set needs to be replaced, and a question-answer data set (which can be obtained by self-gathering data to construct or download an open source data set) is selected in the part, wherein the format is { "title 1": "content of title 1" }, { "title 2": "content of title 2" } … { "title n": "content of title n" }. And taking the model as input, taking a fully trained pre-training model as a base to carry out fine adjustment until the model converges, and outputting a title fine adjustment model.

And sequentially inputting the titles generated by reasoning in the title generation module into a fine tuning model which is fully trained for reasoning, and generating the text contents under each title.

In this embodiment, further fine tuning is performed based on a GPT model that has been trained with sufficient 15 billion parameters. The trimming step is consistent with the trimming scheme in the title generation module, but a data set needs to be replaced, a question-answer data set is selected in the part, and the data set in the part can be selected from the data set extracted from the presentation file and processed into data formats of { "title 1": "content of title 1" }, { "title 2": "content of title 2" } … { "title n": "content of title n" }. And taking the model as input, taking a fully trained pre-training model as a base to carry out fine adjustment until the model converges, and outputting a title fine adjustment model. The importance of the safe electricity utilization, the classification of electric shock accidents, the preventive measure of the safe electricity utilization and the eight principle of the safe electricity utilization are sequentially input into a fine tuning model which is fully trained for reasoning, and the text content under each title is generated:

{ "importance of safe electricity use": the term "safe electricity use" refers to the use of electric appliances, which is to comply with the regulations of safe electricity use to prevent accidents such as electric shock, short circuit, fire, etc. The safe electricity utilization can protect the life safety of people and the property safety, and reduce unnecessary loss. "}

{ "classification of electric shock accidents": "mild electric shock accident: the electric shock current is less than 10mA; there may be perceived and involuntary muscle contraction, and the hands get rid of the electrodes and have become difficult, and each joint of the human body has a pain feeling, and can get rid of the charged body by themselves. An n/n moderate electric shock accident: the electric shock current is below 50 milliamperes, and reaches the tolerance limit of the human body, and if the life is dangerous due to untimely rescue. N/n severe electric shock accident: 100mA or more; pathophysiological effects, asystole, respiratory arrest, and burns and other cellular damage may occur, and three seconds later the heart begins to paralyze, stopping beating. "}

{ "preventive measure for safe use": "no wires are pulled in dormitory, no inserts are discharged to the bed head or fixed to the bed frame. And n/n does not purchase and use the low-cost inferior power strip, and the students distinguish and distinguish the high-quality inferior distinction. And n/n is not a high-power electric appliance such as an electric heating pot, a quick heating device, an electromagnetic oven and the like. An/n is not always used, and the electric appliance needs to be powered off in time. The power supply is periodically powered off to clean dust on the power strip, wet rags are not required to be used for wiping, and the power strip can be used for wiping with dry rags or dipping alcohol for drying. Periodic power-off of n/n is carried out to clean dust on the power strip, a wet rag is not required to be used for wiping, a dry rag or alcohol dipping is used for wiping and drying, and the power strip is used }

{ "eight principles for safe use": first, firmly grasp the common sense of safe electricity. And n/n is strictly in compliance with the operation regulations of electric equipment. And thirdly, checking hidden danger of safe electricity utilization. And n/n is the responsibility system for establishing sound safety electricity utilization. N\n five is an effective management system. And n/n six is the electricity utilization inspection of the tissue safety. And n seven is the organization safety electricity utilization education. N/n eight is the tissue safety electricity utilization experience communication "}

In the implementation of S12, the text content is divided into different parts, each part is used as a presentation file, and keyword extraction is performed on pages other than the home page and the catalog page;

step S121: integrating the output of the text generation module, taking the theme input by the user as the title of the presentation, arranging each secondary title in sequence to be used as a catalog of the presentation, and taking the generated secondary title as the title of each page of the presentation;

specifically, the data obtained after the text content is divided according to the division rule is in the following form:

{ "topic": [ "title 1", "title 2", "title 3", … "," title n "] },

{ "title 1": "content generated based on title 1" },

…,

{ "title n": "Contents generated based on title n" })

The data in the above form is stored in json format.

In this embodiment, the divided contents are:

{

{ "safe use of electricity": the key points of the safety electricity are the importance, the classification of electric shock accidents, the prevention measure of the safety electricity, the eight principle of the safety electricity,

{ "importance of safe electricity use": the term "safe electricity use" refers to the use of electric appliances, which is to comply with the regulations of safe electricity use to prevent accidents such as electric shock, short circuit, fire, etc. The safe electricity utilization can protect the life safety of people and the property safety, and reduce unnecessary loss. "},

{ "classification of electric shock accidents": "mild electric shock accident: the electric shock current is less than 10mA; there may be perceived and involuntary muscle contraction, and the hands get rid of the electrodes and have become difficult, and each joint of the human body has a pain feeling, and can get rid of the charged body by themselves. An n/n moderate electric shock accident: the electric shock current is below 50 milliamperes, and reaches the tolerance limit of the human body, and if the life is dangerous due to untimely rescue. N/n severe electric shock accident: 100mA or more; pathophysiological effects, asystole, respiratory arrest, and burns and other cellular damage may occur, and three seconds later the heart begins to paralyze, stopping beating. "},

{ "preventive measure for safe use": "no wires are pulled in dormitory, no inserts are discharged to the bed head or fixed to the bed frame. And n/n does not purchase and use the low-cost inferior power strip, and the students distinguish and distinguish the high-quality inferior distinction. And n/n is not a high-power electric appliance such as an electric heating pot, a quick heating device, an electromagnetic oven and the like. An/n is not always used, and the electric appliance needs to be powered off in time. The power supply is periodically powered off to clean dust on the power strip, wet rags are not required to be used for wiping, and the power strip can be used for wiping with dry rags or dipping alcohol for drying. The power is periodically cut off to clean dust on the power strip, the wet rag is not required to be used for wiping, the dry rag or the alcohol dipping wiping can be used for airing,

}

Step S122: and constructing a keyword extraction module, and extracting keywords from the title and the content page presentation.

Specifically, the { "title": and constructing a keyword extraction model based on the content generated by the title, inputting the keyword extraction model into the model, and outputting 2-3 keywords. In one embodiment, the keyword extraction model may employ TF-IDF keyword extraction algorithm, and TF-IDF (term frequency-inverse document frequency) is a common weighting technique for information retrieval (information retrieval) and text mining (text mining). TF-IDF is a statistical method used to evaluate the importance of a word to one of a set of documents or a corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. The main idea is that if a word appears in one article with a high frequency TF and in other articles with a small frequency TF, the word or phrase is considered to have a good class distinction capability and is suitable for classification.

TF is the word Frequency (Term Frequency), IDF is the inverse text Frequency index (Inverse Document Frequency), and TF-IDF uses the word Frequency and the inverse document Frequency to model text. TF represents the frequency of occurrence of an entry in text:

wherein the method comprises the steps ofIs that the word is in the file->The denominator is the file +.>The sum of the number of occurrences of all words in (a).

The IDF of a particular word may be divided by the number of documents that contain the word, taking the logarithm of the quotient to be obtained. The expression is:

wherein, the liquid crystal display device comprises a liquid crystal display device,is the total number of documents in the corpus. />Representing comprising words->Is a number of files.

The TF-IDF has the expression:

based on the above rule, the TF-IDF value of each word is calculated, and the top 2-3 keywords with the highest TF-IDF value ranking can be selected as extracted keywords.

In the present embodiment, the { "title": "content generated based on title" } section, taking the first title as an example, inputs { "importance of safe use of electricity": the term "safe electricity use" refers to the use of electric appliances, which is to comply with the regulations of safe electricity use to prevent accidents such as electric shock, short circuit, fire, etc. The safe electricity utilization can protect the life safety of people and the property safety, and reduce unnecessary loss. "}, calculating the TF-IDF value of the vocabulary based on the TF-TDF keyword extraction model. Finally, the largest 3 keywords are selected as electric appliances, electric shock and fire.

In a specific implementation, the method for extracting the keywords may also use an API provided by an open source language big model, for example, chatGPT, chatGLM, fireFly, LLaMA may be used. The input of calling the API is that the fish-flavored shredded pork is fine in material selection, the finished vegetables are red and moist in color and rich in fish flavor, and the shredded pork has the characteristics of salty, sweet, sour and hot taste and soft and tender texture when eaten. Extracting the key words of the section, and outputting the API as shredded fish with fragrant meat, ruddy color and smooth texture.

In the implementation of S13, generating a picture matching image corresponding to each page of presentation by a text generation image module based on the extracted keywords;

specifically, the text-generated image model may choose to call an API that is currently open-sourced, such as a text-generated image model like stable-diffusion, DALL-E2, and the like. Fine tuning training may also be performed on the basis of an open-source pre-training model. In the part, only the theme image pairs in the presentation file are needed to be arranged into the input format of the corresponding model, and then the open source pre-training model is finely adjusted until the model converges. And then sequentially inputting the 2-3 keywords extracted by the keyword extraction module into a fine-tuning model of the text-generated graph to generate 2-3 images. And taking the generated image as a map of the page presentation. So far, the configuration diagram and the text part of the presentation are generated and completed.

In this embodiment, also taking the first title as an example, keywords of "electric appliance", "electric shock" and "fire" are input into the open source API of the stable-diffusion model to generate 3 corresponding pictures.

In the specific implementation of S14, automatically typesetting the divided text content and the map matching images of the corresponding pages to obtain a complete presentation;

specifically, based on the generated text and image, each page of presentation is typeset using python-pptx, and finally the presentation is output. python-pptx is the python library used to create and update PowerPoint files. The generation of a custom ppt presentation from database content may be used in conjunction with the web, with batch updates to the presentation library by clicking on a link in the web application to download the presentation. May be used to add slides, populate text placeholders, add images, text boxes, add operational graphics, titles, theme properties, flowcharts, etc., and add slides in forms, etc.

As can be seen from the above embodiments, the present application develops a new scheme based on the existing link mode, and on one hand, the text generation model generates the text content part of the presentation document and is split into two parts: a presentation title generation module and a presentation content generation module. For the first module we only need the structure of primary topic and secondary topic, for which the data set is available. A second module that can use the now open source question-answer dataset; on the other hand, the integral link mode details are provided, the integral presentation can be generated based on one theme, the efficient, accurate and personalized presentation production can be realized, time and energy can be saved for a user, the quality and effect of the presentation can be greatly improved, and the user can better show own ideas and ideas; the method can form good floor application for the combination of a text generation task in the natural language processing field and a text generation image task in the multi-mode field, and has important theoretical significance and practical value for the rapid development of the two fields.

The application also provides an embodiment of the presentation generating device corresponding to the embodiment of the presentation generating method.

Fig. 3 is a block diagram of a presentation generating device, according to an example embodiment. Referring to fig. 3, the apparatus may include:

an obtaining module 21, configured to obtain a theme of the presentation, and obtain all text contents of the presentation based on a text generating module that is built and trained in advance;

the keyword extraction module 22 is configured to divide the text content into different parts, take each part as a presentation file, and perform keyword extraction on pages other than the home page and the catalog page;

an image generating module 23, configured to generate, based on the extracted keywords, a map matching image corresponding to each page of presentation through the text generating image module;

and the typesetting module 24 is used for automatically typesetting the divided text content and the matched image of the corresponding page to obtain a complete presentation.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Correspondingly, the application also provides electronic equipment, which comprises: one or more processors; a memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the presentation generation method as described above. As shown in fig. 4, a hardware structure diagram of an arbitrary device with data processing capability according to the present application, except for the processor, the memory and the network interface shown in fig. 4, where the arbitrary device with data processing capability in the embodiment is generally according to the actual function of the arbitrary device with data processing capability, may further include other hardware, which is not described herein.

Correspondingly, the application further provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the instructions realize the presentation generating method when being executed by a processor. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any device having data processing capabilities. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A presentation generating method, comprising:

automatically typesetting the divided text content and the map matching image of the corresponding page to obtain a complete presentation;

the text generation module comprises a title generation model and a content generation model, wherein the title generation model is used for generating a secondary title of a presentation according to the theme of the presentation, and the content generation model is used for generating corresponding text content under each secondary title according to the secondary title;

wherein, the training of the title generation model comprises a pre-training stage and a fine tuning stage:

the loss function of the pre-training stage is，

the loss function of the fine tuning stage is that，

2. The method of claim 1, wherein the second-level title of the presentation is obtained by selecting one of the following ways according to the actual situation of the user:

3. The method of claim 1, wherein the content generation model is obtained by fine tuning a title generation model after the pre-training phase.

4. The method of claim 1, wherein structuring the theme, the secondary title, and the text under each secondary title of the presentation to obtain a plurality of parts, and extracting keywords from pages other than the top page and the catalog page by using each part as a presentation page, comprises:

5. The method according to claim 4, wherein, for the title of each page of the presentation and the content generated based on the title, a word frequency-reverse document frequency value of each word is calculated, and a plurality of words having the greatest word frequency-reverse document frequency values are used as keywords.

6. A presentation generating apparatus, comprising:

the typesetting module is used for automatically typesetting the divided text content and the matched image of the corresponding page to obtain a complete presentation;

the loss function of the pre-training stage is，

the loss function of the fine tuning stage is that，

7. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

8. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to any of claims 1-5.