CN109597886B - Extraction generation mixed abstract generation method - Google Patents

Extraction generation mixed abstract generation method Download PDF

Info

Publication number
CN109597886B
CN109597886B CN201811238086.6A CN201811238086A CN109597886B CN 109597886 B CN109597886 B CN 109597886B CN 201811238086 A CN201811238086 A CN 201811238086A CN 109597886 B CN109597886 B CN 109597886B
Authority
CN
China
Prior art keywords
abstract
key
sentence
document
key sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811238086.6A
Other languages
Chinese (zh)
Other versions
CN109597886A (en
Inventor
周玉
朱军楠
张家俊
宗成庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201811238086.6A priority Critical patent/CN109597886B/en
Publication of CN109597886A publication Critical patent/CN109597886A/en
Application granted granted Critical
Publication of CN109597886B publication Critical patent/CN109597886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the field of natural language, and particularly provides an extraction generation hybrid abstract generation method, which aims to solve the problems of the existing extraction type automatic abstract method and the existing generation type automatic abstract method. The invention provides a method for generating a hybrid abstract by extraction, which comprises the steps of identifying entities and numbers in a document and replacing the entities and the numbers in the document by using a preset label; extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method; respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence; according to the comparison result of the length of the first key sentence and a preset length threshold value, the first key sentence or the second key sentence can be selectively used as a first key sentence to be synthesized; and generating an abstract of the document according to all the first key sentences to be synthesized. The method provided by the invention can generate the abstract which accords with the semantic expression of the document and can also ensure the readability.

Description

Extraction generation mixed abstract generation method
Technical Field
The invention belongs to the technical field of natural language, and particularly relates to a method for generating a mixed abstract by extraction and generation.
Background
The automatic abstract is a technology for automatically realizing text analysis, content induction and abstract automatic generation by using a computer system, and can express the main content of an original text in a concise form according to the requirements of readers (or users). The automatic summarization technology can effectively help a reader (or a user) to find interesting contents from the retrieved articles, and the reading speed and the reading quality are improved. The technique can compress the document into a more compact representation and guarantee coverage of the subject matter of value of the original document.
The existing automatic summarization technology mainly comprises two methods: an extraction type automatic summarization method and a generation type automatic summarization method. The extraction type automatic summarization method is characterized in that segments extracted from a document are combined into a summarization, the realization method is simple, the readability is good, but the precision of the obtained summarization is not high; the generation type automatic summarization method is to generate the summary directly from the meaning expression of the document, and has great difficulty but is closer to the essence of the summary.
Therefore, how to propose a scheme that can not only filter unimportant text content in a document, preserve the fluency of the abstract, but also improve the precision of the abstract is a problem that needs to be solved by those skilled in the art at present.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to solve the problems of the existing extraction type automatic summarization method and the generation type automatic summarization method, the invention provides an extraction generation hybrid summarization generation method, which comprises the following steps:
identifying entities and numbers in a document and replacing the entities and numbers in the document with preset tags;
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method;
compressing the plurality of first key sentences respectively to obtain a second key sentence corresponding to each first key sentence;
judging whether the length of the first key sentence is greater than or equal to a preset length threshold value: if so, taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized; if not, directly taking the first key sentence as the first key sentence to be synthesized;
and generating the abstract of the document according to all the first key sentences to be synthesized.
In a preferred embodiment of the above-mentioned method, the step of "extracting a plurality of first key sentences from the document after the tag replacement by using an extraction-type document digest extraction method" includes:
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method based on a Submodular function;
acquiring an original key sentence corresponding to the first key sentence in the document before the label replacement;
and sequencing the corresponding first key sentences according to the sequencing sequence of each original key sentence in the document before the label replacement.
In a preferred technical solution of the foregoing solution, the step of "respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence" includes:
compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence;
wherein the sentence abstract model is a model constructed based on an attention mechanism.
In a preferred technical solution of the above-mentioned solution, the step of "compressing the first key sentence based on a sentence summarization model constructed in advance to obtain a corresponding second key sentence" includes:
acquiring unknown words generated when the first key sentence is compressed;
acquiring the word with the highest attention value at the generation time of the unknown word and replacing the unknown word with the acquired word with the highest attention value.
In a preferred technical solution of the foregoing, before the step of "respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence", the method further includes:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement.
In a preferred embodiment of the foregoing solution, the step of generating the summary of the document according to all the first to-be-synthesized key sentences includes:
restoring the labels in the first key sentence to be synthesized into corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized;
and generating the abstract of the document according to the second key sentence to be synthesized.
Compared with the closest prior art, the technical scheme at least has the following beneficial effects:
1. the extraction generation hybrid abstract generation method provided by the invention can extract the first key sentence through an extraction document abstract extraction method, compress the first key sentence to obtain the second key sentence, selectively take the first key sentence or the second key sentence as the first key sentence to be synthesized according to the comparison result of the length of the first key sentence and a preset length threshold value, and generate the document abstract according to the first key sentence to be synthesized.
2. The method for generating the hybrid abstract by extracting and generating can judge whether the length of the first key sentence is larger than or equal to a preset length threshold value, if so, the second key sentence corresponding to the first key sentence is used as the first key sentence to be synthesized, and if not, the first key sentence is directly used as the first key sentence to be synthesized, so that a more robust abstract can be obtained subsequently, namely, the readability is ensured to the greatest extent while the fact has a certain degree of fidelity.
3. The extraction generation hybrid abstract generation method provided by the invention can extract the first key sentence from the document by the extraction type document abstract extraction method, and can filter some text contents which are not important, so that the abstract of the document can be quickly generated by the generation type automatic abstract method at the later stage, and the high-precision document abstract can be obtained.
Drawings
Fig. 1 is a schematic diagram illustrating the main steps of a hybrid abstract generating method according to an embodiment of the present invention;
fig. 2 is a schematic main framework diagram of a hybrid abstract generating method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, fig. 1 illustrates the main steps of the hybrid abstract generation method in the present embodiment. As shown in fig. 1, the method for generating a hybrid abstract in this embodiment includes the following steps:
step S101: entities and numbers in the document are identified and replaced with preset tags.
Inspired by the manual summarization process (i.e. extracting some important sentences from the original text and then inductively rewriting the sentences), the invention generates the text summary of the long text by the extraction generation hybrid summary generation method. The method of the invention not only can filter some text contents which are not important by using the extraction type document abstract extraction method, but also can keep the fluency of the text abstract generated by the generation type document abstract extraction method. The extraction generation mixed abstract generation method mainly comprises two parts: extracting important sentences in the document and compressing and rewriting the extracted sentences.
Specifically, entities and numbers in the document may be identified and replaced with preset tags. Assuming that given an input document:
It’s just an example for illustration.There are 56nationalities in China.
the document with entities and numbers in the document replaced by preset tags is as follows:
It’s just an example for illustration.There are number-1nationalities in entity-1.
the named entities can be personal names, organization names, place names and other entities identified by names, the broader entities can also comprise numbers, dates, currency, addresses and the like, the entities and the numbers in the documents can be identified by a named entity identification tool space, and the entities and the numbers in the documents can be replaced by preset labels through a Python regular expression.
Step S102: and extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method.
The extraction type document abstract extraction method can extract some representative text segments from an original document to form an abstract, wherein the segments can be sentences, paragraphs or sections in the whole document. Specifically, a plurality of first key sentences may be extracted from the document after the tag replacement by using an extraction-type document abstract extraction method based on a Submodular function, original key sentences corresponding to the first key sentences in the document before the tag replacement are obtained, and the corresponding first key sentences are sorted according to the sorting order of each original key sentence in the document before the tag replacement. The total vocabulary number of the first key sentences is smaller than a preset threshold vocabulary number, which may be 200.
Step S103: and respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence.
Although a plurality of first key sentences extracted by the extraction type document abstract extraction method can filter some text contents which are not important, the obtained abstract has low precision, and in order that the generated abstract can better accord with the expression of document meaning, the abstract which is closer to manual writing is obtained, and the plurality of first key sentences can be compressed. Specifically, the first key sentence may be compressed based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence, where the sentence abstract model is a model constructed based on an attention mechanism.
The step of compressing the first key sentence based on the pre-constructed sentence abstract model to obtain the corresponding second key sentence comprises the following steps:
acquiring unknown words generated when the first key sentence is compressed;
the word with the highest attention value at the generation time of the unknown word is acquired and the unknown word is replaced with the acquired word with the highest attention value.
The sentence abstract model is a model constructed based on an attention mechanism, and can be attached to an Encoder-Decoder framework, wherein the framework can be regarded as a research mode in the field of deep learning, the Encoder encodes an input sentence, converts the input sentence into an intermediate semantic representation through nonlinear transformation, can understand the Encoder as an encoding end and can understand the Decode as a decoding end, the Decode generates words to be generated at a specific moment according to the intermediate semantic representation of the sentence and history information generated before, and when an unknown word appears in the sentence, the word with the highest attention value at the generation moment of the unknown word can be acquired and replaces the unknown word with the highest acquired attention value, so that the readability of the abstract is improved.
Before a plurality of first key sentences are compressed to obtain second key sentences, a sentence abstract model can be trained, and the specific steps are as follows:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement until the sentence abstract model converges, wherein the text data set can be a Gigaword data set.
Step S104: judging whether the length of the first key sentence is greater than or equal to a preset length threshold value, if so, executing a step S105; if not, go to step S106.
In order to obtain a more robust abstract, that is, while ensuring a certain degree of fidelity to the fact, readability is ensured as much as possible, it may be determined whether the length of the first key sentence is greater than or equal to a preset length threshold, and corresponding operations are performed according to the determination result.
Step S105: and taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized.
If the length of the first key sentence is greater than or equal to the preset length threshold, in order to control the number of the finally generated abstract words to be controlled at a reasonable length and improve readability, a second key sentence corresponding to the first key sentence can be used as the first key sentence to be synthesized.
Step S106: and directly taking the first key sentence as the first key sentence to be synthesized.
If the length of the first key sentence is smaller than the preset length threshold, the first key sentence extracted from the document can be considered to meet the requirement of the vocabulary quantity of the final generated abstract, and the first key sentence is directly used as the first key sentence to be synthesized.
Step S107: and generating an abstract of the document according to all the first key sentences to be synthesized.
Specifically, the tags in the first key sentence may be reduced to corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized, and the second key sentence to be synthesized is arranged in order according to the sequence of the original sentences in the document corresponding to the second key sentence to be synthesized to generate the abstract of the document.
Referring to the attached table 1, the attached table 1 exemplarily shows the route value of the hybrid abstract generation method and the sequence-to-sequence attention (S2S + attn) model in the CNN/DailyMail data set (randomly extracting 100 documents as test data). The sentence-title training data set comprises 3,803,957 data pairs, the verification data set comprises 189,651 data pairs, and the test data set comprises 1951 data pairs, and as can be seen from the attached table 1, the extraction generation hybrid abstract generation method of the embodiment can significantly improve two indexes, namely, the route-1 and the route-L. In addition, the sentence abstract model of the embodiment is trained on a Gigaword data set by means of the idea of migration learning, while the existing S2S + attn model is trained on a CNN/Daily Mail data set, so that the model of the embodiment of the invention has better migration.
Attached Table 1 comparison of the present invention with ROUGE values based on a sequence-to-sequence model (S2S + attn)
Figure BDA0001838667810000071
Referring to fig. 2, fig. 2 illustrates the main framework of the hybrid abstract generation method in the present embodiment. As shown in fig. 2, the main framework of the hybrid abstract generating method in this embodiment is as follows:
firstly, extracting an original document to obtain a plurality of first key sentences, then compressing the first key sentences through a pre-constructed sentence abstract model to obtain corresponding second key sentences, and finally, selectively taking the first key sentences or the second key sentences as first key sentences to be synthesized according to a comparison result of the length of the first key sentences and a preset length threshold value, and generating the abstract of the document according to all the first key sentences to be synthesized.
The extraction generation hybrid abstract generation method provided by the invention combines the advantages of an extraction type document abstract extraction method and a generation type document abstract extraction method, can generate an abstract which accords with document semantic expression, can also ensure readability, can extract a first key sentence from a document by the extraction type document abstract extraction method, can filter some text contents which are not important, so that the abstract of the document can be quickly generated by the generation type automatic abstract method at a later stage, and the high-precision document abstract can be obtained.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing or implying any particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (6)

1. A method for generating a hybrid abstract by extraction generation is characterized by comprising the following steps:
identifying entities and numbers in a document and replacing the entities and numbers in the document with preset tags;
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method;
compressing the plurality of first key sentences respectively to obtain a second key sentence corresponding to each first key sentence;
judging whether the length of the first key sentence is greater than or equal to a preset length threshold value: if so, taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized; if not, directly taking the first key sentence as the first key sentence to be synthesized;
and generating the abstract of the document according to all the first key sentences to be synthesized.
2. The abstract generating hybrid abstract generating method as claimed in claim 1, wherein the step of extracting a plurality of first key sentences from the document after the tag replacement by the abstract document abstract extracting method comprises:
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method based on a Submodular function;
acquiring an original key sentence corresponding to the first key sentence in the document before the label replacement;
and sequencing the corresponding first key sentences according to the sequencing sequence of each original key sentence in the document before the label replacement.
3. The method for generating a hybrid abstract according to claim 1, wherein the step of compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence comprises:
compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence;
wherein the sentence abstract model is a model constructed based on an attention mechanism.
4. The method for abstract-generating hybrid abstract of claim 3, wherein the step of compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence comprises:
acquiring unknown words generated when the first key sentence is compressed;
acquiring the word with the highest attention value at the generation time of the unknown word and replacing the unknown word with the acquired word with the highest attention value.
5. The method for generating hybrid abstract as claimed in claim 4, wherein before the step of compressing the plurality of first key sentences respectively to obtain the second key sentence corresponding to each first key sentence, the method further comprises:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement.
6. The abstract generating hybrid type abstract generating method of any one of claims 1 to 5, wherein the step of generating the abstract of the document according to all the first key sentences to be synthesized comprises:
restoring the labels in the first key sentence to be synthesized into corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized;
and generating the abstract of the document according to the second key sentence to be synthesized.
CN201811238086.6A 2018-10-23 2018-10-23 Extraction generation mixed abstract generation method Active CN109597886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811238086.6A CN109597886B (en) 2018-10-23 2018-10-23 Extraction generation mixed abstract generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811238086.6A CN109597886B (en) 2018-10-23 2018-10-23 Extraction generation mixed abstract generation method

Publications (2)

Publication Number Publication Date
CN109597886A CN109597886A (en) 2019-04-09
CN109597886B true CN109597886B (en) 2021-07-06

Family

ID=65957961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811238086.6A Active CN109597886B (en) 2018-10-23 2018-10-23 Extraction generation mixed abstract generation method

Country Status (1)

Country Link
CN (1) CN109597886B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119444B (en) * 2019-04-23 2023-06-30 中电科大数据研究院有限公司 Drawing type and generating type combined document abstract generating model
CN111026861B (en) * 2019-12-10 2023-07-04 腾讯科技(深圳)有限公司 Text abstract generation method, training device, training equipment and medium
CN113011160A (en) * 2019-12-19 2021-06-22 中国移动通信有限公司研究院 Text abstract generation method, device, equipment and storage medium
CN111581358B (en) * 2020-04-08 2023-08-18 北京百度网讯科技有限公司 Information extraction method and device and electronic equipment
CN111858913A (en) * 2020-07-08 2020-10-30 北京嘀嘀无限科技发展有限公司 Method and system for automatically generating text abstract
CN112732901B (en) * 2021-01-15 2024-05-28 联想(北京)有限公司 Digest generation method, digest generation device, computer-readable storage medium, and electronic device
CN113032552B (en) * 2021-05-25 2021-08-27 南京鸿程信息科技有限公司 Text abstract-based policy key point extraction method and system
CN113836892B (en) * 2021-09-08 2023-08-08 灵犀量子(北京)医疗科技有限公司 Sample size data extraction method and device, electronic equipment and storage medium
CN116205234A (en) * 2023-04-24 2023-06-02 中国电子科技集团公司第二十八研究所 Text recognition and generation algorithm based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609845A (en) * 2003-10-22 2005-04-27 国际商业机器公司 Method and apparatus for improving readability of automatic generated abstract by machine
CN104503958A (en) * 2014-11-19 2015-04-08 百度在线网络技术(北京)有限公司 Method and device for generating document summarization
CN108228541A (en) * 2016-12-22 2018-06-29 深圳市北科瑞声科技股份有限公司 The method and apparatus for generating documentation summary

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8930376B2 (en) * 2008-02-15 2015-01-06 Yahoo! Inc. Search result abstract quality using community metadata

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609845A (en) * 2003-10-22 2005-04-27 国际商业机器公司 Method and apparatus for improving readability of automatic generated abstract by machine
CN104503958A (en) * 2014-11-19 2015-04-08 百度在线网络技术(北京)有限公司 Method and device for generating document summarization
CN108228541A (en) * 2016-12-22 2018-06-29 深圳市北科瑞声科技股份有限公司 The method and apparatus for generating documentation summary

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Internet上文本的自动摘要技术;尹存燕等;《计算机工程》;20060228;第32卷(第3期);第88-90页 *
Pointing the Unknown Words;Caglar Gulcehre等;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160812;第140-149页 *
抽取式文档摘要方法(一);仲夏199603;《https://www.pianshen.com/article/52201321841/it610》;20171128;第1-7页 *

Also Published As

Publication number Publication date
CN109597886A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN109597886B (en) Extraction generation mixed abstract generation method
CN107301244B (en) Method, apparatus, system and the trade mark memory of a kind of trade mark point card processing
CN109933796B (en) Method and device for extracting key information of bulletin text
CN113961685A (en) Information extraction method and device
CN107368474A (en) A kind of automatical and efficient translation conversion method of Chinese to braille
CN111859919A (en) Text error correction model training method and device, electronic equipment and storage medium
CN104199871A (en) High-speed test question inputting method for intelligent teaching
CN110516203B (en) Dispute focus analysis method, device, electronic equipment and computer-readable medium
CN112016320A (en) English punctuation adding method, system and equipment based on data enhancement
CN112686044A (en) Medical entity zero sample classification method based on language model
CN111563372B (en) Typesetting document content self-duplication checking method based on teaching book publishing
CN105488471B (en) A kind of font recognition methods and device
CN112749283A (en) Entity relationship joint extraction method for legal field
Volk et al. Nunc profana tractemus. Detecting code-switching in a large corpus of 16th century letters
CN113779345B (en) Teaching material generation method and device, computer equipment and storage medium
CN114239554A (en) Text sentence-breaking method, text sentence-breaking training device, electronic equipment and storage medium
CN111291569B (en) Training method and device for multi-class entity recognition model
Alkhazi et al. Classifying and segmenting classical and modern standard Arabic using minimum cross-entropy
CN112036330A (en) Text recognition method, text recognition device and readable storage medium
Marcińczuk et al. Structure annotation in the polish corpus of suicide notes
Bień The IMPACT project Polish Ground-Truth texts as a DjVu corpus
CN112580303A (en) Punctuation adding system
Aranta et al. Utilization Of Hexadecimal Numbers In Optimization Of Balinese Transliteration String Replacement Method
CN110888976B (en) Text abstract generation method and device
CN110889289B (en) Information accuracy evaluation method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant