CN117436415A

CN117436415A - Presentation generation method and device, electronic equipment and storage medium

Info

Publication number: CN117436415A
Application number: CN202311385580.6A
Authority: CN
Inventors: 王肃晨; 丁健; 胡谦; 胡慧阳
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-01-23

Abstract

The invention provides a presentation generating method, a presentation generating device, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode; extracting the text of the description information to obtain a target text; extracting a target text outline to obtain a target outline; and extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline. The method, the device, the electronic equipment and the storage medium provided by the invention can automatically generate the high-quality presentation, and the generation efficiency of the presentation is improved.

Description

Presentation generation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for generating a presentation, an electronic device, and a storage medium.

Background

With the widespread popularization of office software, presentations are widely used in aspects of social life, for example, presentations are used in the fields of work reports, enterprise announcements, product recommendations, project bidding, management consultation, educational training, and the like. Presentation has become an indispensable tool in modern society work.

However, the conventional presentation creation process is complicated and complicated, and requires a lot of time and effort for the user, including the steps of selecting a theme, designing a layout, editing contents, adjusting a format, and the like, resulting in low efficiency of presentation creation. Moreover, the manually-made presentation may have consistency problems in style, font and typesetting, resulting in an overall effect that is not uniform and professional enough to affect the quality of the presentation.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a storage medium for generating a presentation, which are used for solving the defects of low efficiency and poor quality of the generation of the presentation in the prior art.

The invention provides a presentation generating method, which comprises the following steps:

receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode;

extracting the text of the description information to obtain a target text;

extracting the outline of the target text to obtain a target outline;

and extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration file containing the target text based on the target outline and the demonstration text of each node in the target outline.

According to the method for generating the presentation file provided by the invention, the text extraction is carried out on the description information to obtain the target text, and the method comprises the following steps:

under the condition that the description information comprises audio, carrying out voice transcription on the audio to obtain a transcription text, and carrying out speech normalization on the transcription text to obtain the target text;

under the condition that the description information comprises video, extracting images of the video, and recognizing characters of each extracted image to obtain the target text;

and under the condition that the description information comprises audio and video, carrying out voice transcription on the audio to obtain a transcription text, carrying out image extraction on the video, carrying out character recognition on each image obtained by extraction, and comparing a character recognition result with the transcription text to obtain the target text.

According to the method for generating the presentation file provided by the invention, the method for extracting the presentation text of each node in the target outline from the target text comprises the following steps:

extracting node outline information respectively associated with each node from the target outline;

and matching the node outline information of each node with each language segment in the target text, and determining the demonstration text of each node based on the successfully matched language segments.

According to the presentation generating method provided by the invention, the presentation text of the node is determined based on the successfully matched speech segments, and the method comprises the following steps:

when the description information comprises a video, respectively associating each image included in the video with each language segment in the target text;

determining the demonstration text of the node based on the successfully matched speech segments, and determining the demonstration image of the node based on the image associated with the successfully matched speech segments;

and determining the demonstration text of the node based on the demonstration text and the demonstration image of the node.

According to the presentation generating method provided by the invention, the outline extraction is performed on the target text to obtain the target outline, and the method comprises the following steps:

extracting keywords from the target text, and determining the topics of the target presentation on the basis of the extracted keywords;

layering the target text, and extracting node titles based on layering results;

and sorting the node titles, and determining the target outline based on the titles and the sorted node titles.

According to the presentation generating method provided by the invention, the node titles comprise a primary title and a secondary title;

The layering the target text and extracting node titles based on layering results comprises the following steps:

layering the target text, and extracting a first-level title based on a layering result;

and matching each primary title with each language segment in the target text, and generating a secondary title under each primary title based on the language segments matched with each primary title.

According to the method for generating the presentation file provided by the invention, the generation of the target presentation file containing the target text based on the target outline and the presentation text of each node in the target outline comprises the following steps:

determining a theme template of the target presentation, the theme template being determined based on the target text and/or user input;

and filling the object outline and the text content of each node in the object outline into the theme template to obtain the object presentation.

According to the method for generating the presentation file provided by the invention, the target presentation file containing the target text is generated based on the target outline and the presentation text of each node in the target outline, and then the method further comprises the following steps:

displaying the target presentation, and determining modification operation;

Determining the content to be modified and the modification type in the target presentation based on the modification operation;

executing the modification operation under the modification type on the content to be modified to obtain target content;

and adjusting the target presentation based on the target content, and displaying the adjusted target presentation.

According to the presentation generating method provided by the invention, the target presentation is adjusted based on the target content, and the method comprises the following steps:

and adjusting the demonstration remark text of the target demonstration manuscript based on the target content, wherein the demonstration remark text is determined based on the content displayed on each page in the target demonstration manuscript, and the content comprises at least one of text, table and image.

The invention also provides a presentation generating device, comprising:

the instruction receiving unit is used for receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode;

the text extraction unit is used for extracting the text of the description information to obtain a target text;

The outline extraction unit is used for extracting the outline of the target text to obtain a target outline;

and the manuscript generating unit is used for extracting the demonstration text of each node in the target outline from the target text and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the presentation generating method according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a presentation generation method as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a presentation generating method as described in any one of the above.

According to the method, the device, the electronic equipment and the storage medium for generating the presentation, the target text is obtained by extracting the text of the description information of the presentation to be generated in the presentation generation instruction, the outline of the target text is extracted to obtain the target outline, and meanwhile, the presentation text of each node in the target outline is extracted from the target text, so that the target presentation can be automatically generated based on the target outline and the presentation text of each node in the target outline, the time and labor cost for manually manufacturing the presentation are saved, the working efficiency is improved, the generated presentation has consistent patterns and typesetting, and the unification and the specialization of the whole style are ensured. In addition, the description information describes the content of the presentation to be generated in the form of audio and/or video, so that the corresponding presentation can be quickly generated based on a section of audio and/or video provided by a user, and the operation is simple and the efficiency is high.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for generating a presentation file according to the present invention;

FIG. 2 is a schematic flow chart of extracting a demonstration text provided by the invention;

FIG. 3 is a schematic flow chart of the adjustment target presentation provided by the present invention;

fig. 4 is a schematic structural diagram of a presentation generating device provided by the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the conventional presentation making method, a user needs to spend a lot of time and effort to select a theme, design layout, edit contents, adjust formats, etc., and a manually made presentation may have a problem of consistency in style, font and typesetting, resulting in an overall effect that is not uniform and professional enough to affect the quality of presentation generation. To address these issues, more and more users are beginning to explore and employ presentation generation tools and automation techniques to improve production efficiency and quality.

Currently, there are two main ways to generate a presentation on a web page: the first is static presentation downloading, which is to make a presentation in advance and upload it to the system, when the user clicks to download, the presentation can be downloaded to be made in advance. However, the content and the template of the presentation obtained by downloading in this way are fixed, and the user needs to perform a great amount of modification and adjustment work on the content and the template of the presentation according to the own needs. The second is dynamic presentation downloading, the method requires the user to select the presentation template on the page and fill the content corresponding to the blank of the template on the page form, the user not only needs to fill the content manually in the whole process, but also needs to pay special attention to the corresponding relation between the filled content and the blank in the template, once the user fills the error, the user is easy to generate the presentation which does not accord with the expectations, and the workload is increased. The problem of low efficiency exists in both presentation generation methods, and in this regard, the embodiment of the invention provides a presentation generation method, which can automatically generate a presentation based on audio and/or video provided by a user, thereby greatly improving the working efficiency and overcoming the defects.

Fig. 1 is a flow chart of a presentation generating method provided by the present invention, as shown in fig. 1, the method includes:

step 110, receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode;

specifically, the presentation to be generated refers to a presentation that has not been created or generated, i.e., a presentation that needs to be generated according to a theme specified by a user. The presentation to be generated may be PPT (PowerPoint) or PPT-like presentation. The presentation generation instructions refer to instructions that contain descriptive information and instructions about the presentation to be generated, which may include descriptions of the content, layout, style, and other elements of the presentation. Here, the description information for the presentation to be generated refers to information describing the content, layout, style, other elements, and the like of the presentation.

When the user needs to generate the presentation, the keyboard or the input device can be used for sending out a presentation generation instruction in a text input mode, a presentation generation instruction can be sent out through voice input or video recording, and a presentation generation instruction can be sent out in a mode of uploading a document, audio or video and the like.

Step 120, extracting text from the description information to obtain a target text;

specifically, the description information may describe the content of the presentation to be generated in audio and/or video, and text extraction is required to be performed on the description information before the target presentation is generated, so as to obtain the target text. For example, for a voice or audio file input by a user, it may be subjected to voice transcription, thereby obtaining a target text; for video, it may be decoded first, and image extracted from the decoded video frame, and then text information is extracted using OCR (Optical Character Recognition ) technology, to obtain the target text.

Step 130, extracting a target text outline to obtain a target outline;

specifically, after the target text is extracted, outline extraction may be performed on the target text to obtain a target outline. The target outline refers to an overall framework of the presentation to be generated, which can comprise main chapters, sub-topics, key contents and the like of the presentation, and provides a framework and guidance for subsequent content filling and organization, so that the logic and continuity of the presentation are ensured.

Here, the outline extraction is performed on the target text to obtain the target outline, and a large language model (Large Language Model, LLM) may be used, and the large language model may be a large model of the Starfire (IFlytek Spark), BERT (Bidirectional Encoder Representations from Transformers) model, ROBERTa (Robustly Optimized BERT approach) model, or the like, which is not particularly limited in the embodiment of the present invention.

In addition, the user can describe the outline of the presentation in the audio or video, and the large language model can directly extract the target outline based on the content associated with the outline in the target text.

Further, given that there may be some places where the generated target outline is not very satisfactory to the user, after the target outline is generated, the target outline may be exposed to support the user to modify and adjust the target outline. For example, when a user hovers a mouse over an arbitrary outline title in a target outline, an edited and deleted icon is displayed, if the user clicks the edited icon, the corresponding outline title may be modified, and if the user clicks the deleted icon, the outline title corresponding to the icon may be deleted.

Preferably, in the modifying process of the user, the large language model can also detect the content edited by the user, and if the fact that the content edited and modified by the user is not matched with the target outline generated before is detected, the user can be prompted, and more friendly interaction experience is provided. In addition, if the generated target outline is not in good conformity with the user's expectation, the user can also operate to regenerate a new target outline.

And 140, extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline.

Specifically, each node refers to each chapter in the target outline, and the chapters form the structure and organization framework of the presentation. The presentation text of each node refers to the specific content contained in the chapter of each target outline, for example, the presentation text may be text, diagrams, images, videos, etc. for presentation and communication to the viewer in the presentation.

After the target outline is generated, the demonstration text of each node in the target outline can be extracted from the target text. It should be understood that the presentation text may be content directly extracted from the target text, or may be content obtained by performing optimization processing such as rewriting, compaction, color rendering, or expansion and writing on the text content extracted from the target text, which is not particularly limited in the embodiment of the present invention. Here, the implementation may be performed using a large language model, for example, may be performed using a large star fire model, a BERT model, a ROBERTa model, or the like, which is not particularly limited in the embodiment of the present invention. It can be understood that by extracting the presentation text of the target outline from the target text, the extracted content can be ensured to be consistent with the original description information, so that misunderstanding and omission can be avoided, the accuracy and consistency of the presentation can be improved, and the quality and effect of the presentation can be ensured.

After the demonstration text of each node is extracted, a corresponding slide can be created according to each node in the target outline, a proper slide layout and a proper theme template are selected according to the demonstration needs and styles, and then the demonstration text of each node is filled into the corresponding slides in sequence, so that the target demonstration text can be constructed and obtained. The target presentation refers to a presentation generated according to the theme text and the target outline. The target presentation may be a single PPT page or may include multiple PPT pages, where multiple PPT pages are included, the target presentation may include a cover page, a catalog page, a theme page, a text page, an end page, and so on.

According to the method provided by the embodiment of the invention, the target text is obtained by extracting the text of the description information of the presentation to be generated in the presentation generation instruction, the target outline is obtained by extracting the outline of the target text, and the presentation text of each node in the target outline is extracted from the target text, so that the target presentation can be automatically generated based on the target outline and the presentation text of each node in the target outline, the time and labor cost for manually manufacturing the presentation are saved, the working efficiency is improved, and the generated presentation has consistent style and typesetting, and the unity and the speciality of the whole style are ensured. In addition, the description information describes the content of the presentation to be generated in the form of audio and/or video, so that the corresponding presentation can be quickly generated based on a section of audio and/or video provided by a user, and the operation is simple and the efficiency is high.

Based on the above embodiment, step 120 specifically includes:

under the condition that the description information comprises audio, carrying out voice transcription on the audio to obtain a transcription text, and carrying out phonetic normalization on the transcription text to obtain a target text;

under the condition that the description information comprises video, extracting images from the video, and recognizing characters of each extracted image to obtain a target text;

under the condition that the description information comprises audio and video, carrying out voice transcription on the audio to obtain a transcription text, extracting images of the video, carrying out character recognition on each extracted image, and comparing a character recognition result with the transcription text to obtain a target text.

Specifically, the description information may describe the content of the presentation to be generated in audio and/or video, and in the case that the description information includes audio, the audio may be subjected to speech transcription to obtain a transcription text. Here, the voice transcription refers to a process of converting voice contents in audio into text form. The transcribed text is an editable and processable form of text converted from audio, which is the result of transcription of speech content in audio. In speech transcription of audio, a suitable speech transcription tool or service may be selected for implementation, e.g., a fly-away open platform, fly-away hearing, fly-away cloud speech recognition, etc.

In order to obtain written and clearer consistent text, after the transcribed text is obtained, the transcribed text can be further subjected to phonetic normalization so as to obtain target text. The term "normalization" as used herein refers to the process of modifying, adjusting, and finishing the transcribed text to make it more readable and understandable. For example, the term normalization may include repairing grammars and spelling errors, adding punctuation and formatting, segmenting and adjusting text structures, etc., to make the text clearer, coherent, and easier to understand; the term rules may also include filtering forbidden words, sensitive words, repeated words, etc. in the transcribed text, or may also include performing processes such as color rendering, expansion writing, etc. on the transcribed text, where the term rules may also be determined according to the user's rules requirements and/or current instruction information, which is not particularly limited in the embodiments of the present invention.

In the case where the descriptive information includes video, the video may be image extracted, and each frame of image in the video may be extracted for subsequent processing and analysis. Here, image extraction refers to a process of acquiring a single image or a sequence of continuous images from a video. In image extraction of video, the video may be decoded first, the compression encoding format of the video file is converted into the original image data, for example, the video file may be decoded into a series of video frames by calling a related decoding function using a video decoding library (e.g., FFmpeg); the decoded video frames can be extracted from frame to frame by traversing the video's timeline, each video frame being an image, can be saved as an image file (e.g., in JPEG, PNG, etc.), or can be directly used for subsequent processing using image data.

After the image is extracted, the extracted image can be subjected to character recognition, namely, the text in the image is converted into a text form which can be edited and processed. Word recognition may be implemented using Optical Character Recognition (OCR) technology, whereby the target text is obtained by recognizing words in an image and converting them into editable text.

Under the condition that the description information comprises audio and video, the audio can be subjected to voice transcription to obtain a transcription text, meanwhile, the video is subjected to image extraction, and each image obtained through extraction is subjected to character recognition to obtain a character recognition result. In order to further improve the accuracy and the readability of the target text, the text recognition result after image extraction and the text subjected to voice transcription can be compared and corrected, so that the errors are reduced and the quality of the text is improved. For example, the text recognition result and the transcribed text may be compared and fused to obtain the target text; if the character recognition result and the transcribed text have conflict, the difference and the inconsistent position between the character recognition result and the transcribed text can be checked, and the matching and the correction are carried out word by word or sentence by sentence according to the comparison result, so that a target text is obtained; the text recognition result or the transcribed text may be used as the target text according to the user judgment or the current scene automatic judgment, which is not particularly limited in the embodiment of the present invention.

Based on any of the above embodiments, fig. 2 is a schematic flow chart of extracting a presentation text, as shown in fig. 2, in step 140, the presentation text of each node in the target outline is extracted from the target text, including:

step 141, extracting node outline information respectively associated with each node from the target outline;

and 142, matching the node outline information of each node with each language segment in the target text, and determining the demonstration text of each node based on the successfully matched language segments.

Specifically, the node outline information refers to outline information associated with each node or topic. In the target outline, each node represents a specific theme or sub-theme, and the node outline information contains related contents such as title, abstract, keywords and the like of the node. After the target outline is obtained, each node of the target outline can be determined according to the structure and the hierarchical relationship of the target outline, and any node can be positioned to the position of the node according to the identification or the position of the node, so that outline information of the node can be extracted, and after the node outline information is extracted, the node outline information can be further sorted.

After node outline information respectively associated with each node is extracted, the node outline information of each node can be matched with each language segment in the target text, so that the demonstration text of the node can be determined based on the successfully matched language segments. Each paragraph in the target text refers herein to a paragraph or portion in the target text, typically consisting of one to a plurality of sentences. When node outline information of each node is matched with each language segment in the target text, the matching can be performed based on semantics, the semantic matching refers to determining the association degree between two texts by comparing the semantic similarity between the two texts, and the semantic matching focuses on the similarity of words and considers factors such as context, grammar, semantic structure and the like. In addition, when node outline information of each node is matched with each language segment in the target text, the matching may also be performed based on time logic of the target text or a sequence of text content acquisition, which is not particularly limited in the embodiment of the present invention.

Here, the matching of the node outline information of each node with each language segment in the target text may be implemented using a large language model, which may be a large star fire model, a BERT model, a ROBERTa model, or other language models, which is not particularly limited in the embodiment of the present invention.

It can be understood that under the condition that the successfully matched speech segments are shorter, the successfully matched speech segments can be directly used as the demonstration text of the node; under the condition that the successfully matched speech segments are longer, the abstract text can be used as the demonstration text of the node after the abstract extraction is carried out on the successfully matched speech segments. The abstract extraction model (such as a text compression model) may be used for extracting the abstract from the successfully matched speech segments, and the abstract extraction model may be an extraction type abstract model or a generation type abstract model.

According to the method provided by the embodiment of the invention, the node outline information of each node is matched with each language segment in the target text, and the demonstration text of each node is determined based on the successfully matched language segments, so that the related content of each node and the target text can be more accurately determined, and the accuracy, the relevance and the consistency of the demonstration text of each node in the extracted target outline are jointly ensured from the text layer and the semantic layer, thereby enhancing the demonstration effect and the attraction.

Based on the above embodiment, in step 142, determining the demonstration text of the node based on the successfully matched speech segments includes:

When the description information comprises a video, respectively associating each image included in the video with each speech segment in the target text;

determining a demonstration text of the node based on the successfully matched speech segments, and determining a demonstration image of the node based on the image associated with the successfully matched speech segments;

Specifically, in the case where the description information includes video, each image obtained by extraction may be inserted as a corresponding profile into a corresponding presentation body in the target presentation. In order to insert each image into a corresponding presentation body, each extracted image may be associated with each speech segment in the target text. For example, a video may be broken up into a series of successive image frames, the target text divided into separate segments or paragraphs, and for each image frame and each segment, their correspondence may be associated by a timestamp; the image features and the speech segments features can be extracted respectively, and the similarity between the image frames and the speech segments can be determined by using a similarity calculation method, so that each image frame is associated with the speech segment which is the most similar according to the similarity calculation result.

After each image contained in the video is associated with each language segment in the target text, when the demonstration text of any node in the target outline is determined, the demonstration text and the demonstration image of the node can be respectively determined based on the language segment successfully matched with the node outline information of the node, so that the specific content and the specific form of the demonstration text of the node can be determined according to the demonstration text and the demonstration image of the node.

Here, the presentation text refers to text content for presenting and conveying a specific node or subject during presentation or showing, for example, the presentation text may be a text portion, a text description, etc. in the presentation. A presentation image refers to image content used to present and convey a particular node or topic during a presentation or presentation, for example, the presentation image may be a visual element of a picture, chart, illustration, etc. in a presentation.

According to the method provided by the embodiment of the invention, through respectively associating each image included in the video with each speech segment in the target text, the content of the demonstration text can be supported and supplemented in the demonstration text through the demonstration of the images, so that a more visual, vivid and powerful expression mode is provided.

Based on any of the above embodiments, step 130 specifically includes:

step 131, extracting keywords from the target text, and determining the title of the target presentation based on the extracted keywords;

step 132, layering the target text, and extracting node titles based on the layering result;

step 133, ordering the node titles, and determining the target outline based on the titles and the ordered node titles.

In particular, keyword extraction refers to the process of automatically extracting the most representative and important words or phrases from a given text. For example, keywords and phrases may be extracted using techniques such as word labeling, named entity recognition, etc., such that the title of the target presentation is determined based on the extracted keywords. Here, the title of the target presentation refers to a title for summarizing and describing the content of the target presentation, which is determined based on the target text and the keyword extraction result.

It may be appreciated that the title of the target presentation may include a main title and a sub-title, extracting keywords from the target text, and determining the title of the target presentation based on the extracted keywords may be implemented using a large language model, which may be a large star fire model, a BERT model, a ROBERTa model, or other language models, which is not particularly limited in this embodiment of the present invention.

In addition, when the title of the target presentation is determined based on the target text, the title in the target text can be referred to, so that the obtained title of the target presentation is ensured to be more accurate and meets the user's expectations.

After determining the title of the target presentation, semantic understanding analysis can be performed on the title and the target text, and layering can be performed on the target text based on the semantic understanding result, for example, the target text can be segmented according to the logic structure of the target text, content conversion or natural segmentation points of the text, each paragraph or segment is marked to represent the layer to which the paragraph belongs, and connection between the layering can be established according to the logic relationship and the layer relationship between the paragraphs, so that the layering result is obtained. For each hierarchical paragraph, a subject term extraction may be performed, so that the node titles are determined based on the extracted subject term. Here, the node title refers to a title of each chapter in the presentation determined based on the target text, and is used to organize and present the hierarchical structure and logical relationship of the presentation.

It should be understood that extracting the node titles based on the target text may be implemented by a large language model, which may be a large star fire model, a BERT model, a ROBERTa model, or the like, which is not particularly limited in the embodiment of the present invention.

After the node titles are extracted, the ordering standard can be determined according to the depth, importance, time sequence or other requirements of the hierarchy, so that the node titles are ordered according to the ordering standard, and the ordered node titles are obtained. And forming a target outline based on the determined title of the target presentation and the sorted node titles.

According to the method provided by the embodiment of the invention, the title of the target presentation and the node titles are extracted based on the target text, so that the extracted title and node titles are both related to the target text, and the constructed target outline is accurate, clear and logical.

Based on the above embodiment, the node titles include a primary title and a secondary title; accordingly, step 132 specifically includes:

layering the target text, and extracting a first-level title based on the layering result;

Specifically, the node titles may include a primary title and a secondary title, where the primary title refers to each chapter title in the presentation for summarizing and guiding the content of the entire chapter; the secondary title refers to a secondary title or section title below the primary title. The primary title provides an overall framework and primary content points, while the secondary title further refines and expands these points, making the content more specific and detailed. By using primary and secondary titles, the structure of the presentation may be made clearer and easier to understand.

After determining the title of the target presentation, semantic understanding analysis can be performed on the title and the target text, and layering is performed on the target text based on the semantic understanding result to obtain a layering result, namely a plurality of paragraphs corresponding to different levels, and for the paragraphs of each level, the corresponding first-level title can be extracted. After extracting the primary titles, the primary titles may be matched with the speech segments in the target text, so that the secondary titles under the primary titles may be determined based on the successfully matched speech segments. And associating the generated secondary title with the corresponding primary title, so as to form the node title of the hierarchical structure.

It can be understood that when each level of title is matched with each paragraph in the target text, the matching can be implemented by means of semantic matching, or can be implemented by means of time logic in the target text or the sequence of text content acquisition, which is not particularly limited in the embodiment of the present invention.

Further, after each primary title and the secondary title under each primary title are extracted, a target outline may be constructed. Before the target outline is constructed, the user can be supported to select the content richness of the target presentation, for example, if the content richness of the target presentation is selected as a standard by the user, the constructed target outline comprises a first-level title and a second-level title; if the content richness of the target presentation is selected as 'conciseness' by the user, the constructed target outline only comprises a first-level title; the user can also select the content richness of the target presentation as a single page to meet the specific requirements of the user.

Based on any of the above embodiments, in step 140, generating a target presentation file including the target text based on the target outline and the presentation text of each node in the target outline includes:

Specifically, the theme template refers to a pre-designed template for specifying the style of the overall layout, color, font, icon, background image, etc. of the presentation. The topic template of the target presentation may be determined based on the topic text, may be determined based on user input, and may be determined based on both the topic text and the user input, which is not particularly limited in the embodiments of the present invention.

For example, the large language model may recommend a topic template matching with the presentation to be generated based on the determined topic type of the target presentation, for example, the large language model may display a built-in topic template with multiple styles for the user to select, and may default to one of the topic templates, where the default selected topic template may be a template automatically recommended by the large language model to match with the presentation based on the topic type of the target presentation, or may be a template automatically recommended by the large language model according to the selection preference of most users. And in the display page of the theme template, if the user is not satisfied with the default selected template, the user can manually select from various displayed theme templates, so that the final theme template of the target presentation is determined.

After the topic template is determined, the large language model can fill corresponding contents into the topic template page by page according to the target outline and the demonstration text of each node in the target outline, so that a cover page, a catalog page, a topic page, a text page, an end page and the like of the target demonstration text are generated, and are rendered and displayed for a user to view at a webpage end.

It can be understood that, in the method provided by the embodiment of the invention, when the target presentation is generated, the third party software is not required to be relied on, and the method is realized by using the html technology at the webpage end, and meanwhile, the whole generation process is displayed in a page-by-page rendering mode, so that a user has more visual experience on the generation of the target presentation.

According to the method provided by the embodiment of the invention, the consistency, the specialty and the visual effect of the presentation can be improved by determining the theme template of the target presentation based on the theme text and/or the user input, and personalized customization is supported simultaneously so as to meet the specific requirements and the wind grid of the user.

Based on any of the above embodiments, fig. 3 is a schematic flow chart of the adjustment target presentation provided in the present invention, as shown in fig. 3, after step 140, the method further includes:

Step 310, displaying a target presentation, and determining a modification operation;

step 320, determining the content to be modified and the modification type in the target presentation based on the modification operation;

step 330, executing the modification operation under the modification type to the content to be modified to obtain the target content;

and step 340, adjusting the target presentation based on the target content, and displaying the adjusted target presentation.

It should be noted that, considering that after the target presentation is generated, the user may be dissatisfied with some of the contents, so after the target presentation is generated, the target presentation may be directly displayed on the web page end, and for each page of contents of the target presentation, the user may be supported to modify and adjust.

Specifically, in the page where the target presentation is displayed, a modification operation may be determined, where the modification operation refers to a specific operation of modifying or editing the target presentation, and may include adding, deleting, modifying text content, adjusting layout, changing color, inserting or deleting images, and the like.

In an embodiment, the modification operation may be an operation that the system automatically detects to determine to be modified according to information such as a user's historical modification habit, for example, the user modifies a font size of a target presentation title in a previous process of generating the presentation, and the system may determine a specific modification operation according to the user's historical modification habit.

In another embodiment, the modification operation may be an operation performed by a user through an input device such as a mouse or a keyboard, for example, the user may perform a circle selection or a swipe selection on a place to be modified, or may issue a modification instruction through a voice input manner, and after receiving the modification instruction, the system may determine the modification operation.

After the modification operation is determined, the content to be modified and the modification type in the target presentation may be determined based on the modification operation. For example, the user can first select the text or the image to be modified through the mouse, the page can display the modification operation type selected by the user, and the content to be modified and the modification type can be determined according to the modification operation type selected by the user.

Here, the content to be modified refers to a text, an image or other elements that need to be modified in the target presentation determined according to the modification operation, for example, a user may select a title, a paragraph text, chart data, etc. to be modified. The modification type refers to a specific modification operation that needs to be performed on the content to be modified, and for example, the modification type may be adding new text, deleting existing text, modifying text content or style, resizing an image, changing a color, and the like.

After the content to be modified and the modification type are determined, the content under the modification type can be modified for the content to be modified, that is, the content to be modified is actually edited or adjusted, so that the target content is obtained. The demonstration text of the target demonstration manuscript can be updated and adjusted based on the target content, for example, the content to be modified in the target demonstration manuscript can be directly replaced by the target content, so that the adjusted target demonstration manuscript is obtained and displayed for a user to view.

Further, the user can be supported to modify and adjust the content of the target presentation by utilizing the AI function of the large language model, wherein the modification and adjustment can comprise operations of beautifying, expanding writing, simplifying, translating and the like on the text. For example, if the user feels that a certain text segment content in the target presentation is longer and hopes to be short, at this time, the user can select the text content to be modified, an AI icon appears on the presentation page, various modification operation types (such as beautifying, expanding writing, simplifying, translating and the like) can appear on clicking the AI icon, and after clicking the "simple" selection by the user, the large language model simplifies the text content to be modified and replaces the text content to be modified with the simplified text content. Preferably, an AI edit box can be displayed on the display page of the target presentation, each AI processing step is displayed in the edit box, and if the user is not satisfied with the text after AI processing, the user can select to switch back to the original text content before modification. According to the embodiment of the invention, the corresponding AI processing function is directly realized at the webpage end of the display target presentation, so that the user has stronger operation feeling in AI aspect and better interactive experience.

In addition, after the user selects the text content to be modified, the user can be supported to modify the text content such as font color, background color, font thickness, underline addition and the like; the theme templates of the whole target presentation can be supported to be modified by a user, and the user can select to switch templates with different colors, different formats, different pattern distributions and the like.

According to the method provided by the embodiment of the invention, after the target presentation is generated, the target presentation can be displayed on line, and the user is supported to modify and adjust the target presentation on line, so that the user can generate the presentation which meets the requirements and expectations better according to the requirements and preference of the user.

Based on the above embodiment, in step 340, adjusting the target presentation based on the target content includes:

and adjusting the presentation remark text of the target presentation based on the target content, wherein the presentation remark text is determined based on the content displayed on each page in the target presentation, and the content comprises at least one of text, a table and an image.

In particular, presentation remark text refers to additional text beside or below the content presented for each page in the target presentation for providing additional explanation, description, or remarks. The presentation remark text can be used for a presenter to refer to or remind the presenter of key information or views to be taught during presentation. The presentation remark text may be determined based on the content shown on each page in the target presentation, where the content shown on each page of the target presentation refers to text, charts, images or other elements presented on each slide, and the text may be text content directly presented on the slide or text content extracted from audio and/or video included in the slide.

It is understood that when generating the target presentation, the large language model may synchronously generate the presentation notes text for each page according to the content presented in each page in the target presentation. When the user modifies and adjusts the content in the target presentation to obtain the target content, the corresponding presentation remark text can be synchronously adjusted based on the target content, so that the presentation remark text is consistent with the content displayed on each page in the target presentation.

Further, on the display page of the target presentation, a catalog icon can be displayed, the user clicks a chapter structure capable of viewing the whole target presentation, clicks a certain node title, and can jump to the corresponding target presentation page. If the user modifies and adjusts the node titles of the target presentation, the displayed chapter structure is synchronously updated.

In addition, after the user modifies the target presentation, the target presentation can be downloaded and stored, namely, the complete content of the target presentation displayed at the webpage end can be converted into a PPT format for downloading and storing.

According to the method provided by the embodiment of the invention, the demonstration remark text of the target demonstration manuscript is generated, and the demonstration remark text can be adjusted based on the target content, so that a presenter can be helped to better understand and interpret the content in the target demonstration manuscript, and the demonstration effect and the specialty are improved.

Based on any one of the above embodiments, the embodiment of the present invention provides a method for generating a presentation based on a large language model, where the method includes:

s1, receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode;

s2, extracting text from the description information to obtain a target text;

specifically, under the condition that the description information comprises audio, performing voice transcription on the audio to obtain a transcription text, and performing speech normalization on the transcription text to obtain a target text; under the condition that the description information comprises video, extracting images from the video, and recognizing characters of each extracted image to obtain a target text; under the condition that the description information comprises audio and video, carrying out voice transcription on the audio to obtain a transcription text, extracting images of the video, carrying out character recognition on each extracted image, and comparing a character recognition result with the transcription text to obtain a target text.

S3, extracting a target text outline to obtain a target outline;

Specifically, after the target text is obtained, the large language model can extract the topics of the target presentation, namely the main title and the sub-title, then extract each level of the target presentation based on the main title, the sub-title and the target text, then match each level of the target text with each language segment in the target text, and extract the second level of the target presentation based on the language segments matched with each level of the target text. And finally, combining each primary title with the secondary title, constructing and displaying the target outline.

S3, receiving a first modification operation, and modifying and adjusting the target outline;

specifically, in view of the fact that the generated target outline may have some places that are not satisfied by the user, after the target outline is generated, the target outline may be displayed to support the user to modify and adjust the target outline. In the modifying process of the user, the large language model can also detect the content edited by the user, and if the fact that the content edited and modified by the user is not matched with the target outline generated before is detected, the user can be prompted, and more friendly interaction experience is provided. In addition, if the generated target outline is not in good conformity with the user's expectation, the user can also operate to regenerate a new target outline.

S4, determining a theme template;

specifically, if the user feels that the generated target outline meets the requirements, the user can click on the 'next' at the lowest end of the target outline presentation page, and enter the theme template selection page. In the page, the large language model can display a plurality of built-in topic templates for selection by a user, and one of the topic templates is selected by default, wherein the default topic template can be a template which is automatically recommended by the model and is matched with topic text input by the user and a generated target outline, or can be automatically recommended and selected by the model according to the historical selection condition of the user.

S5, rendering and generating a target presentation file and displaying the target presentation file;

specifically, after the topic template is determined, the user can click to generate in the next step, after receiving the instruction of the user, the large language model generates a cover page, a catalog page, a topic page, a text page primary end page and the like of the target presentation file page by page according to the target outline and the target text, and renders and displays the cover page, the catalog page, the topic page, the text page primary end page and the like at the webpage end for the user to view. It should be understood that when the webpage end performs rendering, the large language model may recommend different formats to display according to the relationship (such as parallel relationship, total score relationship, etc.) between the texts in the presentation body of the generated target presentation.

In addition, when the target presentation is generated, the large language model can synchronously generate the presentation remark text of each page according to the content displayed by each page in the target presentation.

S6, receiving a second modification operation, and modifying and adjusting the target presentation file;

specifically, after each page of content of the target presentation is generated, the user is supported to edit on line at the web page end, for example, text or image addition, modification, deletion and the like can be performed, and the user can click the title, text content and the like of each page through a left mouse button to enter an editing mode, so that the text or image addition, modification or deletion and the like can be realized.

Further, the user can be supported to modify and adjust the content of the target presentation by utilizing the AI function of the large language model, wherein the modification and adjustment can comprise operations of beautifying, expanding writing, simplifying, translating and the like on the text. In addition, when the user modifies and adjusts the content in the target presentation, the large language model can synchronously adjust the corresponding presentation remark text, so that the presentation remark text is consistent with the content displayed on each page in the target presentation.

S7, downloading and exporting a target presentation file;

specifically, after the user modifies the target presentation, the target presentation can be downloaded and stored, that is, the complete content of the target presentation displayed at the web page end can be converted into the PPT format for downloading and storing.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of a presentation generating device provided by the present invention, as shown in fig. 4, the device includes:

the instruction receiving unit 410 is configured to receive a presentation generating instruction, where the presentation generating instruction carries description information for a presentation to be generated, and the description information describes content of the presentation to be generated in audio and/or video;

a text extraction unit 420, configured to extract text from the description information to obtain a target text;

the outline extraction unit 430 is configured to perform outline extraction on the target text to obtain a target outline;

the manuscript generation unit 440 is configured to extract a presentation text of each node in the target outline from the target text, and generate a target presentation containing the target text based on the target outline and the presentation text of each node in the target outline.

According to the device provided by the embodiment of the invention, the target text is obtained by extracting the text of the description information of the presentation to be generated in the presentation generation instruction, the target outline is obtained by extracting the outline of the target text, and the presentation text of each node in the target outline is extracted from the target text, so that the target presentation can be automatically generated based on the target outline and the presentation text of each node in the target outline, the time and labor cost for manually manufacturing the presentation are saved, the working efficiency is improved, and the generated presentation has consistent style and typesetting, and the unity and the speciality of the whole style are ensured. In addition, the description information describes the content of the presentation to be generated in the form of audio and/or video, so that the corresponding presentation can be quickly generated based on a section of audio and/or video provided by a user, and the operation is simple and the efficiency is high.

Based on any of the above embodiments, the text extraction unit 420 is specifically configured to:

Based on any of the above embodiments, the document generation unit 440 specifically includes:

the node outline extraction subunit is used for extracting node outline information respectively associated with each node from the target outline;

and the demonstration text determining subunit is used for matching the node outline information of each node with each language segment in the target text and determining the demonstration text of each node based on the successfully matched language segments.

Based on any of the above embodiments, the presentation body determination subunit is specifically configured to:

Based on any of the above embodiments, the outline extraction unit 430 specifically includes:

the topic determination subunit is used for extracting keywords from the target text and determining the topic of the target presentation file based on the extracted keywords;

the node title extraction subunit is used for layering the target text and extracting a node title based on a layering result;

and the outline determining subunit is used for sequencing the titles of the nodes and determining the target outline based on the titles and the sequenced node titles.

Based on any of the above embodiments, the node header includes a primary header and a secondary header, and the node header extraction subunit is specifically configured to:

Based on any of the above embodiments, the document generation unit 440 further includes a template determination subunit for:

Based on any one of the above embodiments, the apparatus further includes a document modification unit, and the document modification unit specifically includes:

the display subunit is used for displaying the target presentation file and determining modification operation;

the determining subunit is used for determining the content to be modified and the modification type in the target presentation file based on the modification operation;

a modification subunit, configured to perform modification operation under a modification type on the content to be modified, so as to obtain target content;

and the adjusting subunit is used for adjusting the target presentation file based on the target content and displaying the adjusted target presentation file.

Based on any of the above embodiments, the adjustment subunit is specifically configured to:

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a presentation generation method comprising: receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode; extracting the text of the description information to obtain a target text; extracting a target text outline to obtain a target outline; and extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute a presentation generating method provided by the above methods, and the method includes: receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode; extracting the text of the description information to obtain a target text; extracting a target text outline to obtain a target outline; and extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the presentation generating method provided by the above methods, the method comprising: receiving a presentation generating instruction, wherein the presentation generating instruction carries description information aiming at a presentation to be generated, and the description information describes the content of the presentation to be generated in an audio and/or video mode; extracting the text of the description information to obtain a target text; extracting a target text outline to obtain a target outline; and extracting the demonstration text of each node in the target outline from the target text, and generating a target demonstration manuscript containing the target text based on the target outline and the demonstration text of each node in the target outline.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A presentation generating method, comprising:

extracting the text of the description information to obtain a target text;

extracting the outline of the target text to obtain a target outline;

2. The presentation generating method according to claim 1, wherein the text extracting the description information to obtain a target text includes:

3. The presentation generating method according to claim 1, wherein the extracting the presentation text of each node in the target outline from the target text includes:

4. The presentation generating method as claimed in claim 3, wherein said determining the presentation body of the node based on the successfully matched speech segments comprises:

5. The presentation generating method according to claim 1, wherein the extracting the outline of the target text to obtain the target outline includes:

layering the target text, and extracting node titles based on layering results;

6. The presentation generating method as claimed in claim 5, wherein the node title comprises a primary title and a secondary title;

7. The presentation generating method according to any one of claims 1 to 6, wherein the generating a target presentation containing the target text based on the target outline and a presentation body of each node in the target outline includes:

8. The presentation generating method according to any one of claims 1 to 6, wherein the generating a target presentation containing the target text based on the target outline and the presentation text of each node in the target outline further comprises:

displaying the target presentation, and determining modification operation;

9. The presentation generating method as claimed in claim 8, wherein said adjusting the target presentation based on the target content comprises:

10. A presentation generating apparatus, comprising:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the presentation generation method of any of claims 1 to 9 when the program is executed.

12. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the presentation generation method of any of claims 1 to 9.