CN111428024A - Method and device for extracting text abstract, computer storage medium and terminal - Google Patents

Method and device for extracting text abstract, computer storage medium and terminal Download PDF

Info

Publication number
CN111428024A
CN111428024A CN202010190805.2A CN202010190805A CN111428024A CN 111428024 A CN111428024 A CN 111428024A CN 202010190805 A CN202010190805 A CN 202010190805A CN 111428024 A CN111428024 A CN 111428024A
Authority
CN
China
Prior art keywords
sentences
sentence
text
extraction
abstract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010190805.2A
Other languages
Chinese (zh)
Inventor
陈栋
付骁弈
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010190805.2A priority Critical patent/CN111428024A/en
Publication of CN111428024A publication Critical patent/CN111428024A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A method, a device, a computer storage medium and a terminal for extracting a text abstract comprise: sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences; calculating the similarity between sentences according to the obtained sentence vectors; performing abstract extraction according to the calculated similarity between the sentences; the language model is used for generating statement vectors with preset dimensions. The sentence vector with the preset dimensionality is obtained through sentence embedding, the control of the operation dimensionality is realized based on the control of the sentence vector dimensionality, and the extraction efficiency of the text abstract is improved.

Description

Method and device for extracting text abstract, computer storage medium and terminal
Technical Field
The present disclosure relates to, but not limited to, natural language analysis technologies, and in particular, to a method, an apparatus, a computer storage medium, and a terminal for text summarization.
Background
The text excerpt serves to summarize the original document as concisely as possible given the important content of the document or documents. The text abstract with good quality can play an important role in the information retrieval process, for example, the text abstract is used for replacing an original document to participate in indexing, so that the retrieval time can be effectively shortened, redundant information in a retrieval result can be reduced, and the user experience is improved.
With the advent of the information explosion era, the automatic text summarization refers to a method for automatically extracting key information in a text and generating a text summarization with a specified length by using an artificial intelligence algorithm; automated text summarization is becoming an important research topic in the field of natural language processing. The automatic text abstract can be divided into the following steps according to the generation mode of the text abstract: the abstract text abstract, the generative text abstract and the compressed text abstract.
The extraction type text abstract extracts ready-made sentences from the original text to generate the text abstract by calculating the weight of sentence components in the original text, so that the error rate is low in grammar and syntax, and the quality of the text abstract is ensured to a certain extent. Currently, the extraction text summarization method mainly performs word frequency statistics on an original text based on a bag of words model (BOW, bag of words), and obtains a sentence vector according to a word frequency statistical result; by the extraction type text summarization method based on the bag-of-words model, the obtained statement vectors are sparse in matrix and large in dimensionality, so that the calculation amount of the weight calculation process is large. In addition, word frequency statistics do not indicate similarity of words, which results in similar sentences possibly being extracted into the abstract at the same time; word frequency statistics also does not consider statement ordering, which can cause syntactic problems in the abstract. The method for obtaining the statement vector based on the bag-of-words model and based on the average word vector obtains the statement vector through word embedding processing, similarity among statements is considered through algorithm improvement, but the dimension of the generated statement vector is the same as that of the bag-of-words model, and the operation amount is large.
In summary, the extraction efficiency of the method for extracting the text abstract is still to be improved when the text abstract is obtained.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the invention provides a method, a device, a computer storage medium and a terminal for extracting a text abstract, which can improve the quality and efficiency of generating the text abstract.
The embodiment of the invention provides a method for extracting a text abstract, which comprises the following steps:
sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences;
calculating the similarity between sentences according to all the obtained sentence vectors;
performing abstract extraction according to the calculated similarity between the sentences;
the language model is used for generating statement vectors with preset dimensions.
On the other hand, an embodiment of the present invention further provides a computer storage medium, where a computer program is stored in the computer storage medium, and when the computer program is executed by a processor, the method for extracting a text abstract is implemented.
In another aspect, an embodiment of the present invention further provides a terminal, including: a memory and a processor, the memory having a computer program stored therein; wherein,
the processor is configured to execute the computer program in the memory;
the computer program, when executed by the processor, implements a method of text summarization as described above.
In another aspect, an embodiment of the present invention further provides a device for extracting a text abstract, where the device includes: the device comprises an embedding unit, a calculating unit and an extracting unit; wherein,
the embedding unit is used for: sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences;
the computing unit is to: calculating the similarity between sentences according to all the obtained sentence vectors;
the extraction unit is used for: performing abstract extraction according to the calculated similarity between the sentences;
the language model is used for generating statement vectors with preset dimensions.
The sentence vector with the preset dimensionality is obtained through sentence embedding, the control of the operation dimensionality is realized based on the control of the sentence vector dimensionality, and the extraction efficiency of the text abstract is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart of a method for extracting a text abstract according to an embodiment of the present invention;
fig. 2 is a block diagram of an apparatus for extracting a text abstract according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Fig. 1 is a flowchart of a method for extracting a text abstract according to an embodiment of the present invention, as shown in fig. 1, including:
step 101, embedding sentences segmented from an original text into sentences based on a preset language model to obtain sentence vectors of the sentences;
the language model is used for generating statement vectors with preset dimensions.
In the embodiment of the invention, the dimensionality of the statement vector is determined according to a language model; for example, the vector dimension is defined as 512, and each sentence will have a 512-dimensional vector expression through the pre-trained language model; processing an original text through sentence embedding to obtain an N-M matrix; where N represents the number of sentences and M represents the sentence embedding dimension. Through the determination of the language model, the embodiment of the invention realizes the dimension control of statement vectors, and further realizes the control of the operand.
In an exemplary embodiment, before embedding a sentence into a preset language model to obtain a sentence vector of each sentence, the method in the embodiment of the present invention further includes:
and cleaning the original text.
In an exemplary embodiment, before embedding a sentence into a preset language model to obtain a sentence vector of each sentence, the method in the embodiment of the present invention further includes:
and performing sentence segmentation on the original text.
The embodiment of the invention can carry out the cleaning and sentence segmentation of the original text by referring to the relevant principle; the embodiment of the invention aims at the original texts in different fields and different modes, and the cleaning modes can be different. In an exemplary embodiment, the cleaning may be performed by one or any combination of the following: english punctuation is converted into Chinese punctuation; removing redundant symbols; full half angle conversion; special symbol processing (e.g., emoticons, text-to-color, etc.).
The embodiment of the invention aims at the original texts in different fields and different application scenes, and the sentence segmentation modes can be different; in an exemplary embodiment, the embodiment of the present invention may perform sentence segmentation by using punctuation marks in one or any combination of the following: comma, period, exclamation point, and question mark.
In an exemplary embodiment, the language model includes any one of:
embedded language models (E L MO, embedding from language model) and bi-directional pre-trained language models (Bert, Bidirectional Encoder responses from transformations).
The language model is a model existing in the related art; the language model of the embodiment of the present invention may further include a model that is analyzed and trained by those skilled in the art based on the field to which the original text belongs, according to the principle in the related art;
102, calculating the similarity between sentences according to the obtained sentence vectors;
it should be noted that, after obtaining the statement vectors, the embodiments of the present invention may calculate the similarity between the statements through matrix operation, and may finally obtain an N × N matrix, where N represents the number of the statements.
The similarity of the statement vector selects different calculation modes according to different situations, such as cosine similarity (cosine similarity), Euclidean distance and the like; taking cosine Similarity as an example, the calculation formula of Similarity (Similarity) is as follows:
Figure BDA0002415834350000051
103, abstracting the abstract according to the calculated similarity between the sentences;
in an exemplary embodiment, the abstracting according to the calculated similarity between sentences includes:
determining abstract extraction reference information according to the calculated similarity between the sentences;
performing statement extraction according to the determined abstract extraction reference information to obtain a text abstract;
wherein, the abstract extraction reference information comprises: the content of the sentences, the weight ordering of the sentences and the position information of the sentences in the original text.
It should be noted that, after the abstract extraction reference information is determined, the embodiment of the present invention may perform statement extraction by referring to the principle of an extraction manner.
In an exemplary embodiment, determining the summarization reference information according to the calculated similarity between sentences includes:
and processing the similarity between the computed sentences according to a text ranking (TextRank) algorithm to obtain a text ranking result containing abstract extraction reference information.
It should be noted that the location information in the embodiment of the present invention may include, for example, a location index. TextRank is a graph-based ranking algorithm for texts in the related art, and a text ranking result can be obtained through operation of the TextRank algorithm. The text is divided into a plurality of composition units (sentences), a node connection graph is constructed, the similarity between the sentences is used as the weight of edges, the TextRank value of the sentences is calculated through loop iteration until the final convergence, and the iteration formula is as follows:
Figure BDA0002415834350000052
the embodiment of the invention carries out sentence embedding through the selected language model to obtain the sentence vector with the preset dimensionality, realizes the control of the operation dimensionality based on the dimensionality control of the sentence vector, and improves the extraction efficiency of the text abstract through the control of the operation dimensionality.
The embodiment of the invention also provides a computer storage medium, wherein a computer program is stored in the computer storage medium, and when being executed by a processor, the computer program realizes the method for extracting the text abstract.
An embodiment of the present invention further provides a terminal, including: a memory and a processor, storing a computer program; wherein,
the processor is configured to execute the computer program in the memory;
the computer program, when executed by a processor, implements a method for performing text summarization as described above.
Fig. 2 is a block diagram of a device for extracting a text abstract according to an embodiment of the present invention, as shown in fig. 2, including: the device comprises an embedding unit, a calculating unit and an extracting unit; wherein,
the embedding unit is used for: sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences;
the computing unit is to: calculating the similarity between sentences according to the obtained sentence vectors;
the extraction unit is used for: performing abstract extraction according to the calculated similarity between the sentences;
the language model is used for generating statement vectors with preset dimensions.
In an exemplary embodiment, an apparatus of an embodiment of the present invention further includes a preprocessing unit, configured to:
and cleaning the original text.
In an exemplary embodiment, the device preprocessing unit of the embodiment of the present invention is further configured to:
and performing sentence segmentation on the original text.
In an exemplary embodiment, the language model includes any one of:
an embedded language model E L MO and a bi-directional pre-trained language model Bert.
In an exemplary embodiment, the extraction unit includes a determination module and an extraction module; wherein,
the determination module is to: determining abstract extraction reference information according to the calculated similarity between the sentences;
the extraction module is used for: performing statement extraction according to the determined abstract extraction reference information to obtain a text abstract;
wherein the digest extraction reference information includes: the content of the sentences, the weight ordering of the sentences and the position information of the sentences in the original text.
In an exemplary embodiment, the determining module is specifically configured to:
and processing the similarity between the calculated sentences according to a text ranking TextRank algorithm to obtain a text ranking result containing the abstract extraction reference information.
The sentence vector with the preset dimensionality is obtained through sentence embedding, the control of the operation dimensionality is realized based on the control of the sentence vector dimensionality, and the extraction efficiency of the text abstract is improved.
The following is a brief description of the embodiments of the present invention by way of application examples, which are only used to illustrate the embodiments of the present invention and are not used to limit the scope of the present invention.
Application example
The application example is illustrated by taking the following original text as an example: the natural language processing being computer
Figure BDA0002415834350000071
The natural language processing is not generally used for researching natural language, but is used for developing a computer system, particularly a software system therein, which can effectively realize natural language communication.
Cleaning up the original text by the following method: english punctuation is converted into Chinese punctuation; removing redundant symbols; full half angle conversion; special symbol processing (emoticons, characters, and the like); the cleaned text is:
natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics, but has important difference. Natural language processing is not a general study of natural language but is directed to the development of computer systems, and particularly software systems therein, that can efficiently implement natural language communications. It is thus part of computer science.
The embodiment of the invention performs sentence segmentation on the cleaned text, and the segmentation basis of the example sentence segmentation of the application comprises the following steps: commas, periods, exclamation marks and question marks; the sentences obtained after text segmentation comprise:
natural language processing is an important direction in the fields of computer science and artificial intelligence;
it studies various theories and methods that can achieve effective communication between people and computers using natural language;
natural language processing is a science integrating linguistics, computer science and mathematics;
thus;
research in this area will involve natural language;
namely the language used by people in daily life;
therefore, it is closely related to the research of linguistics;
but with important differences;
natural language processing does not generally study natural language;
but to develop a computer system capable of effectively implementing natural language communication;
in particular software systems thereof;
it is thus part of computer science.
Assuming that the vector dimension of the statement vector is set to be 512 in the application example, each statement is expressed by a 512-dimensional statement vector through the pre-trained language model, and finally a matrix of N × M is obtained, wherein N represents the number of statements and M represents the sentence embedding dimension.
The result of the application example statement embedding:
1. [0.95,1.12,.. multidot.2.34 ] # is 512 in length;
....
12、[1.67,2.62,....,0.43]
after obtaining the statement vector of each statement, the embodiment of the invention can calculate the similarity between the statements through matrix operation to obtain an N × N matrix, wherein N represents the number of the statements. The Similarity of the statement vectors is calculated in different calculation modes according to different situations, for example, by Cosine Similarity (Cosine Similarity); the result of cosine similarity calculation in this application example is:
#12 matrix
[
[1,0.15,0.46…,1.12]
[1.15,0.16,0.12…,1]
]
The # diagonals are all 1 (the similarity of the sentence to the sentence is 1)
After the similarity between sentences is obtained through calculation, the embodiment of the invention calculates the weight of the sentences in the original text through the TextRank, and the TextRank result comprises the following steps: 1. weight (importance) of the sentence; 2. the position index of the sentence in the original text; 3. the contents of the sentence. The calculation result of the application example is as follows:
[ (0.099090693997561027,0, 'natural language processing is an important direction in the fields of computer science and artificial intelligence');
(0.088957031707728049,6, 'so it is closely related to linguistic studies');
(0.088832091445133168,1, 'which studies various theories and methods enabling efficient communication between a human and a computer in natural language');
(0.088421818743333713,4, 'a study in this area will involve natural language');
(0.086919796865634419,11, 'so it is part of computer science');
(0.083775503846167859,2, 'natural language processing is a science integrating linguistics, computer science, and mathematics');
(0.083333333333333329,3, 'therefore');
(0.082774986848257362,8, 'natural language processing is not generally investigating natural language');
(0.082390619289261002,9 'in the development of a computer system that can efficiently implement natural language communications');
(0.075705137538748288,10, 'particularly the software system therein');
(0.070541980497901946,7, 'but with significant differences');
(0.069257005886939754,5, 'i.e., the language people use daily') ].
After the text ranking result is obtained, selecting the sentences ranked in the front by weight according to the text ranking result, and extracting and combining the selected sentences into the text abstract; the application example can be analyzed and determined by the technical personnel in the field to extract the number of sentences, and can be sorted according to the position information of the sentences in the text ranking result; wherein the location information comprises a location index. In the present application example, assuming that the number of sentences of the text abstract is 3, the first three sentences of the TextRank are selected according to the text ranking result, and are sorted according to the position index, and the text abstract is obtained by ending with comma separated periods:
natural language processing is an important direction in the fields of computer science and artificial intelligence, and it is a research on various theories and methods for realizing effective communication between people and computers by using natural language, so that it has close connection with the research on linguistics.
The application example carries out Sentence Embedding (sequence Embedding) through the pre-trained language model, calculates the similarity among the sentences to extract and determine the extracted sentences, reduces the dimensionality of the Sentence vectors and the similarity calculation amount, and improves the accuracy of similarity calculation.
"one of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art. "

Claims (10)

1. A method for extracting text summaries comprises the following steps:
sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences;
calculating the similarity between sentences according to the obtained sentence vectors;
performing abstract extraction according to the calculated similarity between the sentences;
the language model is used for generating statement vectors with preset dimensions.
2. The method according to claim 1, wherein before the sentence embedding based on the preset language model obtains the sentence vector of each sentence, the method further comprises:
and cleaning the original text.
3. The method according to claim 1, wherein before the sentence embedding based on the preset language model obtains the sentence vector of each sentence, the method further comprises:
and performing sentence segmentation on the original text.
4. The method of claim 1, wherein the language model comprises any of:
an embedded language model E L MO and a bi-directional pre-trained language model Bert.
5. The method according to any one of claims 1 to 4, wherein the extracting the summary according to the calculated similarity between the sentences comprises:
determining abstract extraction reference information according to the calculated similarity between the sentences;
performing statement extraction according to the determined abstract extraction reference information to obtain a text abstract;
wherein the digest extraction reference information includes: the content of the sentences, the weight ordering of the sentences and the position information of the sentences in the original text.
6. The method of claim 5, wherein the determining the abstracted reference information according to the calculated similarity between sentences comprises:
and processing the similarity between the calculated sentences according to a text ranking TextRank algorithm to obtain a text ranking result containing the abstract extraction reference information.
7. A computer storage medium, in which a computer program is stored, which, when being executed by a processor, implements a method of implementing text summarization according to any one of claims 1 to 6.
8. A terminal, comprising: a memory and a processor, the memory having a computer program stored therein; wherein,
the processor is configured to execute the computer program in the memory;
the computer program is used for realizing the method for realizing text abstract extraction in any one of claims 1-6 when being executed by the processor.
9. An apparatus for implementing text summarization, comprising: the device comprises an embedding unit, a calculating unit and an extracting unit; wherein,
the embedding unit is used for: sentence embedding is carried out on sentences segmented from an original text based on a preset language model to obtain sentence vectors of the sentences;
the computing unit is to: calculating the similarity between sentences according to the obtained sentence vectors;
the extraction unit is used for: performing abstract extraction according to the calculated similarity between the sentences;
the language model is used for generating statement vectors with preset dimensions.
10. The apparatus of claim 9, wherein the extraction unit comprises a determination module and an extraction module; wherein,
the determination module is to: determining abstract extraction reference information according to the calculated similarity between the sentences;
the extraction module is used for: performing statement extraction according to the determined abstract extraction reference information to obtain a text abstract;
wherein the digest extraction reference information includes: the content of the sentences, the weight ordering of the sentences and the position information of the sentences in the original text.
CN202010190805.2A 2020-03-18 2020-03-18 Method and device for extracting text abstract, computer storage medium and terminal Pending CN111428024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010190805.2A CN111428024A (en) 2020-03-18 2020-03-18 Method and device for extracting text abstract, computer storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010190805.2A CN111428024A (en) 2020-03-18 2020-03-18 Method and device for extracting text abstract, computer storage medium and terminal

Publications (1)

Publication Number Publication Date
CN111428024A true CN111428024A (en) 2020-07-17

Family

ID=71549534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010190805.2A Pending CN111428024A (en) 2020-03-18 2020-03-18 Method and device for extracting text abstract, computer storage medium and terminal

Country Status (1)

Country Link
CN (1) CN111428024A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984793A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Text emotion classification model training method and device, computer equipment and medium
CN112052308A (en) * 2020-08-21 2020-12-08 腾讯科技(深圳)有限公司 Abstract text extraction method and device, storage medium and electronic equipment
CN112732900A (en) * 2021-01-04 2021-04-30 山东众阳健康科技集团有限公司 Electronic medical record text abstract extraction method
CN113420545A (en) * 2021-08-24 2021-09-21 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium
CN113673215A (en) * 2021-07-13 2021-11-19 北京搜狗科技发展有限公司 Text abstract generation method and device, electronic equipment and readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739973A (en) * 2018-12-20 2019-05-10 北京奇安信科技有限公司 Text snippet generation method, device, electronic equipment and storage medium
CN110287309A (en) * 2019-06-21 2019-09-27 深圳大学 The method of rapidly extracting text snippet
US20190370316A1 (en) * 2017-06-22 2019-12-05 Tencent Technology (Shenzhen) Company Limited Information processing method and related device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370316A1 (en) * 2017-06-22 2019-12-05 Tencent Technology (Shenzhen) Company Limited Information processing method and related device
CN109739973A (en) * 2018-12-20 2019-05-10 北京奇安信科技有限公司 Text snippet generation method, device, electronic equipment and storage medium
CN110287309A (en) * 2019-06-21 2019-09-27 深圳大学 The method of rapidly extracting text snippet

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052308A (en) * 2020-08-21 2020-12-08 腾讯科技(深圳)有限公司 Abstract text extraction method and device, storage medium and electronic equipment
CN111984793A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Text emotion classification model training method and device, computer equipment and medium
CN112732900A (en) * 2021-01-04 2021-04-30 山东众阳健康科技集团有限公司 Electronic medical record text abstract extraction method
CN112732900B (en) * 2021-01-04 2022-07-29 山东众阳健康科技集团有限公司 Electronic medical record text abstract extraction method
CN113673215A (en) * 2021-07-13 2021-11-19 北京搜狗科技发展有限公司 Text abstract generation method and device, electronic equipment and readable medium
CN113420545A (en) * 2021-08-24 2021-09-21 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium
CN113420545B (en) * 2021-08-24 2021-11-09 平安科技(深圳)有限公司 Abstract generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111428024A (en) Method and device for extracting text abstract, computer storage medium and terminal
Bai et al. A survey on automatic image caption generation
CN106844346B (en) Short text semantic similarity discrimination method and system based on deep learning model Word2Vec
CN107220232B (en) Keyword extraction method and device based on artificial intelligence, equipment and readable medium
Klementiev et al. Inducing crosslingual distributed representations of words
CN111177365A (en) Unsupervised automatic abstract extraction method based on graph model
US20120179454A1 (en) Apparatus and method for automatically generating grammar for use in processing natural language
CN110347790B (en) Text duplicate checking method, device and equipment based on attention mechanism and storage medium
Fang et al. Topic aspect-oriented summarization via group selection
CN112861514B (en) Attention-enhanced full-correlation variational self-encoder for partitioning grammar and semantics
CN109271641A (en) A kind of Text similarity computing method, apparatus and electronic equipment
Nishikawa et al. Learning to generate coherent summary with discriminative hidden semi-markov model
CN112487151B (en) Document generation method and device, storage medium and electronic equipment
CN110704608A (en) Text theme generation method and device and computer equipment
CN113918031A (en) System and method for Chinese punctuation recovery using sub-character information
CN111695358A (en) Method and device for generating word vector, computer storage medium and electronic equipment
Kang et al. Knowledge-consistent dialogue generation with knowledge graphs
Kettunen Keep, change or delete? setting up a low resource ocr post-correction framework for a digitized old finnish newspaper collection
CN111666379A (en) Event element extraction method and device
CN111091001B (en) Method, device and equipment for generating word vector of word
CN111401070B (en) Word meaning similarity determining method and device, electronic equipment and storage medium
CN117057349A (en) News text keyword extraction method, device, computer equipment and storage medium
CN110609997B (en) Method and device for generating abstract of text
CN114722774B (en) Data compression method, device, electronic equipment and storage medium
Dorr et al. Cross-language headline generation for Hindi

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200717