CN112836476B

CN112836476B - Summary generation method, device, equipment and medium

Info

Publication number: CN112836476B
Application number: CN202110156415.8A
Authority: CN
Inventors: 徐文铭; 杨晶生; 杜春赛; 郑翔
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-02-22
Anticipated expiration: 2041-02-04
Also published as: CN112836476A

Abstract

The disclosed embodiments relate to a summary generation method, apparatus, device and medium, wherein the method comprises: the method comprises the steps of obtaining an initial text of a target multimedia, conducting transcription processing on the initial text based on a data cleaning model to obtain a target text, generating an initial summary based on the target text, mapping the initial summary to the initial text, and determining a corresponding target summary. By adopting the technical scheme, the text of the target multimedia is transcribed through the deep learning model, the summary is generated based on the transcribed text, and the transcribed text better meets the requirements of various summary algorithms due to the fact that the transcription of the syntactic structure of the text and the processing of noise are realized in the transcription process, so that the generation of the subsequent summary is more accurate, and the effect is better; moreover, the summary is mapped back to the initial text, so that the user can know the association relationship between the summary and the initial text, and better service interaction between the summary and the initial text can be realized.

Description

Summary generation method, device, equipment and medium

Technical Field

The present disclosure relates to the field of identification technologies, and in particular, to a summary generation method, apparatus, device, and medium.

Background

With the continuous development of smart devices and multimedia technologies, information recording by smart devices is increasingly applied in daily and office life.

In some related products, multimedia files recorded with information can be processed to generate key points for rapidly acquiring important information. However, the current generation with emphasis has the problem of low accuracy.

Disclosure of Invention

To solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides a summary generation method, apparatus, device, and medium.

The embodiment of the disclosure provides a summary generation method, which comprises the following steps:

acquiring an initial text of a target multimedia;

performing transcription processing on the initial text based on a data cleaning model to obtain a target text;

generating an initial summary based on the target text;

and mapping the initial summary to the initial text, and determining a corresponding target summary.

The disclosed embodiment also provides a summary generation device, the device includes:

the text acquisition module is used for acquiring an initial text of the target multimedia;

the transfer module is used for performing transfer processing on the initial text based on the data cleaning model to obtain a target text;

an initial summary module for generating an initial summary based on the target text;

and the target summary module is used for mapping the initial summary to the initial text and determining a corresponding target summary.

An embodiment of the present disclosure further provides an electronic device, which includes: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the summary generation method provided by the embodiment of the disclosure.

The embodiment of the present disclosure also provides a computer-readable storage medium, which stores a computer program for executing the summary generation method provided by the embodiment of the present disclosure.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: the summary generation scheme provided by the embodiment of the disclosure obtains an initial text of a target multimedia, transcribes the initial text based on a data cleaning model to obtain the target text, generates an initial summary based on the target text, maps the initial summary to the initial text, and determines a corresponding target summary. By adopting the technical scheme, the text of the target multimedia is transcribed through the deep learning model, the summary is generated based on the transcribed text, and the transcribed text better meets the requirements of various summary algorithms due to the fact that the transcription of the syntactic structure of the text and the processing of noise are realized in the transcription process, so that the generation of the subsequent summary is more accurate, and the effect is better; moreover, the summary is mapped back to the initial text, so that the user can know the association relationship between the summary and the initial text, and better service interaction between the summary and the initial text can be realized.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart diagram of a summary generation method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart diagram of another summary generation method provided in the embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a summary generation apparatus provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

After a conference is finished, audio or video recorded in the conference process can be converted into a text through recognition processing, and under the scene of knowledge mining on the text, a common task is to automatically generate a text summary, also called a text abstract, through a computer technology. Whether based on a supervised or unsupervised president algorithm, requires relative accuracy and cleanliness of the raw data.

Many conference-opening conversation scenes similar to video conferences have much noise data in own data, and the conversion capability of a speech recognition engine cannot reach 100%, so that the real noise data seriously influences the normal operation of the algorithm in any way. At present, two ideas are used for cleaning noise data, one is to perform data enhancement during training and introduce real noise data to train a summary algorithm model, the other is to perform sampling pre-data processing, and before the training data and the test data are sent to the model, unified data cleaning and processing are performed. In the traditional data cleaning and processing, because the conference text has a large number of non-structured data expression modes of spoken language besides noise, the traditional data cleaning method is difficult to be effective, and the accuracy of the summary is low. In order to solve the above problem, embodiments of the present disclosure provide a summary generation method, which is described below with reference to specific embodiments.

Fig. 1 is a schematic flowchart of a summary generation method provided in an embodiment of the present disclosure, which may be executed by a summary generation apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method includes:

step 101, obtaining an initial text of a target multimedia.

The target multimedia may be any multimedia data for recording information, for example, the target multimedia may be conference multimedia, that is, multimedia data for recording a conference process. The initial text refers to a text content obtained after a target multimedia is recognized and processed by using an Automatic Speech Recognition (ASR) technology, and a specific Speech Recognition technology is not limited in the embodiment of the present disclosure, for example, a random model method or an artificial neural network method may be used.

In the embodiment of the disclosure, an initial text obtained by processing a target multimedia in advance may be obtained, or the target multimedia may be obtained and processed to obtain the initial text.

And 102, performing transcription processing on the initial text based on the data cleaning model to obtain a target text.

The data cleaning model can be a pre-trained deep learning model for cleaning data of the initial text, and the specific deep learning model is not limited. In the embodiment of the present disclosure, the data cleaning model is described by taking a generative deep learning model as an example, and the generative deep learning model may be implemented based on a Neural Machine Translation (NMT) algorithm.

In the embodiment of the present disclosure, performing transcription processing on the initial text based on the data cleaning model to obtain the target text may include: and inputting the input text sentences in the initial text into the data cleaning model for transcription operation, and determining the target text consisting of the transcribed target text sentences. The input text sentence is obtained by sentence cutting or dividing the initial text, and the number of the input text sentences can be multiple. The target text sentence is an input text sentence obtained after the data cleaning model is processed.

And converting each input text sentence included in the initial text into a sentence vector, inputting each sentence vector into a pre-trained data cleaning model, and performing transcription operation to obtain a target text sentence, wherein the target text sentence forms a target text. The target text after being transcribed by the data cleaning model is smoothly rewritten with original meaning of some spoken words, wrong words, words with partial incompliant semantics and the like, so that the subsequent data processing is easier to realize.

Optionally, the summary generation method in the embodiment of the present disclosure may further include: in the process of transferring the input text sentence in the initial text, adjusting parameters and/or relevancy according to the degree in the data cleaning model to output the target text sentence; the degree adjusting parameter is used for representing the degree of transcription, and the correlation degree is used for representing the correlation between the target text sentence after the transcription and the input text sentence before the transcription. The degree adjustment parameter may be a parameter for representing a degree of transcription, the smaller the degree adjustment parameter is, the closer the degree adjustment parameter is to keeping the syntactic structure, the text semantic, and the like of the text sentence in the input initial text, and the larger the degree adjustment parameter is, the tendency is to update the syntactic structure, the text semantic, and the like of the input text sentence. And the degree of correlation can be understood as the correlation between the target text sentence after the transcription and the input text sentence before the transcription. In the process of using the data cleansing model to transcribe the input text sentences in the initial text, the output of the target text sentences can be controlled by the degree adjusting parameters and/or the relevancy, for example, the number of the output target text sentences can be controlled, that is, parts are selected from the transcribed text sentences for output.

Optionally, outputting the target text statement according to the relevance in the data cleaning model may include: and determining the input text sentence with the correlation degree larger than or equal to the correlation degree threshold value after the transcription as a target text sentence, and outputting the target text sentence. The correlation threshold is a minimum value of the preset correlation, and may be specifically set according to an actual situation. In the process of inputting the input text sentence in the initial text into the data cleaning model for transcription, for the text sentence after the transcription, the correlation degrees between the text sentence before the transcription and the input text sentence before the transcription can be determined, and the text sentence after the transcription with the correlation degree larger than or equal to the correlation degree threshold value is determined as the target text sentence for output.

In the scheme, the transfer of the text sentences can be controlled through the parameters in the data cleaning model, so that the transfer effect is better, and the processing effects of noise, spoken words and the like are better.

And 103, generating an initial summary based on the target text.

The initial summary refers to important information obtained by summarizing and extracting the target text after data cleaning.

In this embodiment of the disclosure, generating an initial summary based on the target text may include: and inputting the target text into a summary generation model, and determining an initial summary. The summary generation model, also called abstract generation model, refers to a pre-trained deep learning model for extracting important information of a text content, and the embodiment of the disclosure does not limit the specific summary generation model, for example, an extraction type summary generation model may be adopted. And inputting the target text obtained in the previous step into a summary generation model, and obtaining an initial summary through extraction type summary generation.

And step 104, mapping the initial summary to the initial text, and determining a corresponding target summary.

Wherein, the target summary refers to the final summary of the text composition before the original cleaning is mapped back on the basis of the initial summary.

In this embodiment of the present disclosure, after performing transcription processing on the initial text based on the data cleansing model to obtain the target text, the method may further include: and establishing a mapping relation between the input text sentence in the initial text and the target text sentence in the target text. That is, after the input text sentence in the initial text is transcribed to obtain the target text sentence, a mapping relationship between the input text sentence before the transcription and the target text sentence after the transcription can be established. The mapping relationship may be stored in a mapping relationship table for later use.

Illustratively, assume that the input text statement in the initial text is labeled [0,1,2,3,4,5] in order, the target text statement is labeled [0,1,2,3,4,5,6], and the mapping relationship is {0: [0,1], 1: [2,3], 2: [3], 3: [4,5], 4: [5], 5: [6 }.

Specifically, mapping the initial summary into the initial text, and determining the corresponding target summary may include: and determining an associated sentence corresponding to the initial summary in the initial text based on the mapping relation, and determining the associated sentence as a target summary. Based on the mapping relation determined before, an associated sentence corresponding to the initial summary in the initial text can be determined, and the associated sentence is determined as a final target summary.

For example, assume that the mapping relationship is {0: [0,1], 1: [2,3], 2: [3], 3: [4,5], 4: [5], 5: [6] }, the sentences in the initial summary are marked as [0,1,5], and the sentences in the target summary are [0,4] based on the mapping relationship, that is, the first text sentence and the fifth text sentence in the initial text.

In the embodiment of the disclosure, the text is transcribed by using a generative algorithm of a deep learning model, so that the transcription and noise processing of a stronger syntactic structure of an original text are realized, and the generated summary is more accurate; meanwhile, the mapping relation between the originally input sentences and the originally output sentences is stored, and after the summary is generated based on the cleaned and output clean text, the summary composed of the originally input sentences is mapped back, so that the finally generated summary has an incidence relation with the original text, and the service interaction between the summary and the original text is favorably realized.

The summary generation scheme provided by the embodiment of the disclosure obtains an initial text of a target multimedia, transcribes the initial text based on a data cleaning model to obtain the target text, generates an initial summary based on the target text, maps the initial summary to the initial text, and determines a corresponding target summary. By adopting the technical scheme, the text of the target multimedia is transcribed through the deep learning model, the summary is generated based on the transcribed text, and the transcribed text better meets the requirements of various summary algorithms due to the fact that the transcription of the syntactic structure of the text and the processing of noise are realized in the transcription process, so that the generation of the subsequent summary is more accurate, and the effect is better; moreover, the summary is mapped back to the initial text, so that the user can know the association relationship between the summary and the initial text, and better service interaction between the summary and the initial text can be realized.

In some embodiments, after obtaining the initial text of the target multimedia, the method may further include: carrying out sentence division on the initial text to obtain a plurality of input text sentences; and preprocessing the input text sentence based on a set rule. Optionally, preprocessing the input text statement based on the setting rule may include: erroneous punctuation and/or nonsense words in the input text sentence are deleted.

The input text sentence is obtained by sentence cutting or dividing the initial text, and specifically, the initial text can be cut according to punctuations and converted into a plurality of input text sentences. The setting rule may be a rule for processing a plurality of text sentences to remove obvious errors, and may be specifically set according to an actual situation.

In the embodiment of the disclosure, the initial text is divided into sentences to obtain a plurality of input text sentences, and then the input text sentences are preprocessed based on the set rule. The preprocessing of the input text sentence based on the setting rule may include: and searching words and punctuation included in each text sentence, judging whether error punctuation and/or nonsense words are included, and if so, deleting the words and punctuation. The nonsense words may include linguistic words, repeated words, etc., for example, if a text sentence includes two "ones", one "is deleted. In the embodiment of the present disclosure, a word bank may be provided to store a plurality of nonsense words for preprocessing.

In the embodiment of the disclosure, after the multimedia text is acquired, the text sentences included in the text can be preprocessed based on the set rule, and as the set rule is used for realizing smooth processing and removing obvious errors, the preprocessed text sentences are more beneficial to subsequent processing, thereby improving the efficiency and accuracy of subsequent data cleaning and generation summary.

Fig. 2 is a schematic flow chart of another summary generation method provided in the embodiment of the present disclosure, and the embodiment further optimizes the summary generation method on the basis of the above embodiment. As shown in fig. 2, the method includes:

step 201, obtaining an initial text of the target multimedia.

Step 202, inputting the input text sentence in the initial text into the data cleaning model for transcription operation, and determining a target text consisting of the transcribed target text sentence.

Wherein, the data cleaning model is a generative deep learning model.

Illustratively, suppose the initial text is "so is at the meeting in 2019, he claims to push the inside out of the load s h s to reduce the time delay of the official network to within the standard time delay range, i.e., demo is around 22 seconds, and then finally can go into saying what the load s h s he did at all? Just from within his given document we can see that he probably done five new properties, "the target text after transcription can be" so, in 2019 at a congress, he announced that loishs will be started to reduce the official site's delay to the standard delay range, i.e. about 22 seconds is needed for the presentation, then we can now go to the main topic, what did it do for the SHS load at all? From the document he provides to us, you can see that he may have created five new functions ".

For another example, assume that the initial text is "for each bar we say, the feature that i find most useful at first is this c. It is asked that this child is it, except for my original segment ui. It also generates a series of partial c. The name is that you can understand that, for example, it was one segment of six seconds, and then it might generate three and eighteen seconds later to start playing, and then every six seconds later. It will generate some very short fragment ul segments in addition to the original ones. For example, the official website gives 333 ms 1 of 1 app to be shown by the so-called even e x part tag, and the transcribed target text can be "let us discuss one by one". First, i consider the most useful function to be C. Asking if this is its lifetime. The UI will generate a series of parts C in addition to the original internal segments. This name can be understood as a six second time period which then generates three segments. It will start playing after eighteen seconds and update every six seconds, so in addition to the original inner it will generate a very short inner UL segment. For example, its official website says: 1 is 333 ms in the application, one of which shows the so-called ex part mark ".

Through the above example, it can be found that some spoken words, wrong words, partially semantically inconsistent words, and the like are smoothly rewritten after the transcription, so that the spoken words, wrong words, partially semantically inconsistent words, and the like can be more easily understood and processed by people or machines. Meanwhile, for part of input errors which obviously do not exist, for example, "lili s h s" is some misrecognition generated in the recognition process, and a space is arranged in the middle, at this time, the part of semantics cannot be considered in the transcription, but the part of semantics can be treated as a proper noun.

Optionally, after step 202, the method may further include: and establishing a mapping relation between the input text sentence in the initial text and the target text sentence in the target text.

Step 203, in the process of transferring the input text sentence in the initial text, adjusting parameters and/or relevancy according to the degree in the data cleaning model and outputting the target text sentence.

The degree adjusting parameter is used for representing the degree of transcription, and the correlation degree is used for representing the correlation between the target text sentence after the transcription and the input text sentence before the transcription. Specifically, outputting the target text sentence according to the relevance in the data cleaning model includes: and determining the input text sentence with the correlation degree larger than or equal to the correlation degree threshold value after the transcription as a target text sentence, and outputting the target text sentence.

Step 204, inputting the target text into a summary generation model, and determining an initial summary.

Step 205, determining the associated sentence corresponding to the initial summary in the initial text based on the mapping relation, and determining the associated sentence as the target summary.

The summary generation method of the embodiments of the present disclosure is further described below with a specific example. Taking the target multimedia as the conference multimedia as an example, the process of the overall technical scheme for determining the summary may include: firstly, extracting the caption text or the conference text and storing the caption text or the conference text in a list [ string ] format according to the sentence granularity to generate a character string list consisting of text character strings of the caption sentence, and it can be understood that the above character string format is only an example. And secondly, performing smooth processing of the first step in a regular mode, and removing obvious errors including redundant punctuations, tone words, repeated words and the like. And thirdly, preparing a pre-training seq2seq deep learning model. The scheme is realized by adopting an open source framework based on an NMT algorithm and an OpenNMT. And fourthly, performing generative transcription operation on all list [ string ] through a pre-training model, and controlling the quantity of output sentences through parameter optimization. In the NMT model, the output transcription degree can be adjusted through adjusting the degree adjusting parameter, the smaller the degree adjusting parameter is, the closer the degree adjusting parameter is to keeping the input syntactic structure and text semantics, and the larger the degree adjusting parameter is, the input syntactic structure tends to be greatly updated. Fifthly, converting and storing the output text sentence by sentence to generate a text data list of new _ list [ string ]. And meanwhile, the mapping relation of the sentences is saved. And sixthly, transmitting the new _ list [ string ] of the output section as input to an abstract model in the system, and performing abstract generation. And seventhly, mapping is carried out through the mapping relation of the input sentences and the output sentences obtained in the fifth step, and the final abstract result is the text sentences in the input conference text.

When the summary extraction is performed on the conference text, the transcribed text result may contain more misrecognized wrong content due to the limitation of audio quality and speech recognition transcription technology. In addition, in a conversation scene such as a conference, there is often a lot of invalid information that is spoken. The false content and invalid information bring about a large interference to the extraction of the summary. Therefore, pre-data cleansing and processing steps are very necessary and efficient for the extraction of epochs in this scenario. The embodiment of the disclosure provides a mode for carrying out unified preposed data cleaning based on a deep learning generating algorithm, and a set of end-to-end data cleaning scheme can be realized for various types of data such as noise, spoken language, errors and the like of various scenes based on the generating algorithm. The generated conference summary in the embodiment of the disclosure is more accurate, and because the mapping relation of the text sentences before and after data cleaning is stored, after the summary is generated based on the text after data cleaning, the final conference summary can be obtained by remapping the original conference text based on the mapping relation, so that the finally generated conference summary has an incidence relation with the original conference text, and better service interaction between the conference summary and the conference text is facilitated.

The summary generation scheme provided by the embodiment of the disclosure includes obtaining an initial text of a target multimedia, inputting an input text statement in the initial text into a data cleaning model for transcription, determining a target text composed of transcribed target text statements, outputting the target text statements according to degree adjusting parameters and/or relevancy in the data cleaning model during transcription of the input text statements in the initial text, inputting the target text into a summary generation model, determining an initial summary, determining associated statements corresponding to the initial summary in the initial text based on a mapping relation, and determining the associated statements as the target summary. By adopting the technical scheme, the text of the target multimedia is transcribed through the deep learning model, the summary is generated based on the transcribed text, and the transcribed text better meets the requirements of various summary algorithms due to the fact that the transcription of the syntactic structure of the text and the processing of noise are realized in the transcription process, so that the generation of the subsequent summary is more accurate, and the effect is better; moreover, the summary is mapped back to the initial text, so that the user can know the association relationship between the summary and the initial text, and better service interaction between the summary and the initial text can be realized.

Fig. 3 is a schematic structural diagram of a summary generation apparatus provided in an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 3, the apparatus includes:

a text obtaining module 301, configured to obtain an initial text of a target multimedia;

a transcription module 302, configured to perform transcription processing on the initial text based on a data cleaning model to obtain a target text;

an initial summary module 303, configured to generate an initial summary based on the target text;

a target summary module 304, configured to map the initial summary into the initial text, and determine a corresponding target summary.

Optionally, the transcription module 302 is specifically configured to:

and inputting the input text sentence in the initial text into the data cleaning model for transcription operation, and determining a target text consisting of the transcribed target text sentence.

Optionally, the apparatus further includes a transfer control module, configured to:

and in the process of transferring the input text sentence in the initial text, outputting the target text sentence according to the degree adjusting parameter and/or the relevancy in the data cleaning model.

Optionally, the transfer control module is specifically configured to:

and determining the input text sentence with the correlation degree larger than or equal to the correlation degree threshold value after the transcription as the target text sentence, and outputting the target text sentence.

Optionally, the target multimedia is conference multimedia, and the data cleaning model is a generative deep learning model.

Optionally, the initial summary module 303 is specifically configured to:

and inputting the target text into a summary generation model, and determining the initial summary.

Optionally, the apparatus further includes a mapping module, configured to: after the initial text is subjected to transcription processing based on a data cleaning model to obtain a target text,

and establishing a mapping relation between the input text sentence in the initial text and the target text sentence in the target text.

Optionally, the target summary module 304 is specifically configured to:

determining an associated sentence corresponding to the initial summary in the initial text based on the mapping relation, and determining the associated sentence as the target summary.

Optionally, the apparatus further includes a preprocessing module, specifically configured to: after the initial text of the target multimedia is obtained,

carrying out sentence division on the initial text to obtain a plurality of input text sentences;

and preprocessing the input text sentence based on a set rule.

Optionally, the preprocessing module is specifically configured to:

deleting erroneous punctuation and/or nonsense words in the input text sentence.

The summary generation device provided by the embodiment of the disclosure can execute the summary generation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

Embodiments of the present disclosure also provide a computer program product, comprising a computer program/instructions, which when executed by a processor, implement the summary generation method provided in any of the embodiments of the present disclosure.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring now specifically to fig. 4, a schematic diagram of an electronic device 400 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 400 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), and the like, and fixed terminals such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program performs the above-described functions defined in the summary generation method of the embodiments of the present disclosure when executed by the processing apparatus 401.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an initial text of a target multimedia; performing transcription processing on the initial text based on a data cleaning model to obtain a target text; generating an initial summary based on the target text; and mapping the initial summary to the initial text, and determining a corresponding target summary.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In accordance with one or more embodiments of the present disclosure, there is provided a summary generation method including:

acquiring an initial text of a target multimedia;

generating an initial summary based on the target text;

According to one or more embodiments of the present disclosure, in a summary generation method provided by the present disclosure, a transcription process is performed on an initial text based on a data cleansing model to obtain a target text, including:

According to one or more embodiments of the present disclosure, in the summary generation method provided by the present disclosure, the method further includes:

in the process of transferring the input text sentence in the initial text, outputting the target text sentence according to the degree adjusting parameter and/or the degree of correlation in the data cleaning model; the degree adjusting parameter is used for representing the degree of transcription, and the relevance is used for representing the relevance between the target text sentence after the transcription and the input text sentence before the transcription.

According to one or more embodiments of the present disclosure, in a summary generation method, outputting the target text sentence according to the relevance in the data cleansing model includes:

According to one or more embodiments of the present disclosure, in a summary generation method, the target multimedia is conference multimedia, and the data cleaning model is a generative deep learning model.

According to one or more embodiments of the present disclosure, in a summary generation method, generating an initial summary based on the target text includes:

According to one or more embodiments of the present disclosure, in the summary generation method provided by the present disclosure, after the transcription processing is performed on the initial text based on the data cleansing model to obtain the target text, the method further includes:

According to one or more embodiments of the present disclosure, in a summary generation method, mapping the initial summary into the initial text, and determining a corresponding target summary, the method includes:

According to one or more embodiments of the present disclosure, in the summary generation method, after obtaining the initial text of the target multimedia, the method further includes:

and preprocessing the input text sentence based on a set rule.

According to one or more embodiments of the present disclosure, in a summary generation method provided by the present disclosure, the preprocessing the input text sentence based on a set rule includes:

In accordance with one or more embodiments of the present disclosure, there is provided an epoch generating apparatus including:

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the transcription module is specifically configured to:

According to one or more embodiments of the present disclosure, in a summary generation apparatus provided by the present disclosure, the apparatus further includes a transfer control module configured to:

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the transcription control module is specifically configured to:

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the target multimedia is conference multimedia, and the data cleaning model is a generative deep learning model.

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the initial summary module is specifically configured to:

According to one or more embodiments of the present disclosure, in an epoch generating apparatus provided by the present disclosure, the apparatus further includes a mapping module configured to: after the initial text is subjected to transcription processing based on a data cleaning model to obtain a target text,

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the target summary module is specifically configured to:

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the apparatus further includes a preprocessing module, specifically configured to: after the initial text of the target multimedia is obtained,

and preprocessing the input text sentence based on a set rule.

According to one or more embodiments of the present disclosure, in the summary generation apparatus provided by the present disclosure, the preprocessing module is specifically configured to:

In accordance with one or more embodiments of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement any of the summary generation methods provided by the present disclosure.

According to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing any of the summary generation methods provided by the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A summary generation method, comprising:

acquiring an initial text of a target multimedia;

generating an initial summary based on the target text;

mapping the initial summary to the initial text, and determining a corresponding target summary;

the method for transferring the initial text based on the data cleaning model to obtain the target text comprises the following steps:

inputting the input text sentences in the initial text into the data cleaning model for transcription operation, and determining a target text consisting of the transcribed target text sentences;

and, the method further comprises:

2. The method of claim 1, wherein outputting the target text statement according to the relevance in the data cleansing model comprises:

3. The method of claim 1, wherein the target multimedia is conference multimedia and the data cleansing model is a generative deep learning model.

4. The method of claim 1, wherein generating an initial summary based on the target text comprises:

5. The method of claim 1, further comprising, after the transcribing the initial text based on the data cleansing model to obtain the target text:

6. The method of claim 5, wherein mapping the initial summary into the initial text, determining a corresponding target summary, comprises:

7. The method of claim 1, wherein after obtaining the initial text of the target multimedia, further comprising:

and preprocessing the input text sentence based on a set rule.

8. The method of claim 7, wherein the preprocessing the input text sentence based on the set rule comprises:

9. An epoch generating apparatus, comprising:

the target summary module is used for mapping the initial summary to the initial text and determining a corresponding target summary;

and, the apparatus further comprises a transfer control module:

the transfer control module is used for outputting the target text statement according to the degree adjusting parameter and/or the relevancy in the data cleaning model in the process of transferring the input text statement in the initial text; the degree adjusting parameter is used for representing the degree of transcription, and the relevance is used for representing the relevance between the target text sentence after the transcription and the input text sentence before the transcription.

10. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the summary generation method of any of claims 1-8.

11. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the epoch generating method as claimed in any one of claims 1 to 8.