CN109597886B - Extraction generation mixed abstract generation method - Google Patents
Extraction generation mixed abstract generation method Download PDFInfo
- Publication number
- CN109597886B CN109597886B CN201811238086.6A CN201811238086A CN109597886B CN 109597886 B CN109597886 B CN 109597886B CN 201811238086 A CN201811238086 A CN 201811238086A CN 109597886 B CN109597886 B CN 109597886B
- Authority
- CN
- China
- Prior art keywords
- abstract
- key
- sentence
- document
- key sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
- Machine Translation (AREA)
Abstract
The invention belongs to the field of natural language, and particularly provides an extraction generation hybrid abstract generation method, which aims to solve the problems of the existing extraction type automatic abstract method and the existing generation type automatic abstract method. The invention provides a method for generating a hybrid abstract by extraction, which comprises the steps of identifying entities and numbers in a document and replacing the entities and the numbers in the document by using a preset label; extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method; respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence; according to the comparison result of the length of the first key sentence and a preset length threshold value, the first key sentence or the second key sentence can be selectively used as a first key sentence to be synthesized; and generating an abstract of the document according to all the first key sentences to be synthesized. The method provided by the invention can generate the abstract which accords with the semantic expression of the document and can also ensure the readability.
Description
Technical Field
The invention belongs to the technical field of natural language, and particularly relates to a method for generating a mixed abstract by extraction and generation.
Background
The automatic abstract is a technology for automatically realizing text analysis, content induction and abstract automatic generation by using a computer system, and can express the main content of an original text in a concise form according to the requirements of readers (or users). The automatic summarization technology can effectively help a reader (or a user) to find interesting contents from the retrieved articles, and the reading speed and the reading quality are improved. The technique can compress the document into a more compact representation and guarantee coverage of the subject matter of value of the original document.
The existing automatic summarization technology mainly comprises two methods: an extraction type automatic summarization method and a generation type automatic summarization method. The extraction type automatic summarization method is characterized in that segments extracted from a document are combined into a summarization, the realization method is simple, the readability is good, but the precision of the obtained summarization is not high; the generation type automatic summarization method is to generate the summary directly from the meaning expression of the document, and has great difficulty but is closer to the essence of the summary.
Therefore, how to propose a scheme that can not only filter unimportant text content in a document, preserve the fluency of the abstract, but also improve the precision of the abstract is a problem that needs to be solved by those skilled in the art at present.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to solve the problems of the existing extraction type automatic summarization method and the generation type automatic summarization method, the invention provides an extraction generation hybrid summarization generation method, which comprises the following steps:
identifying entities and numbers in a document and replacing the entities and numbers in the document with preset tags;
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method;
compressing the plurality of first key sentences respectively to obtain a second key sentence corresponding to each first key sentence;
judging whether the length of the first key sentence is greater than or equal to a preset length threshold value: if so, taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized; if not, directly taking the first key sentence as the first key sentence to be synthesized;
and generating the abstract of the document according to all the first key sentences to be synthesized.
In a preferred embodiment of the above-mentioned method, the step of "extracting a plurality of first key sentences from the document after the tag replacement by using an extraction-type document digest extraction method" includes:
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method based on a Submodular function;
acquiring an original key sentence corresponding to the first key sentence in the document before the label replacement;
and sequencing the corresponding first key sentences according to the sequencing sequence of each original key sentence in the document before the label replacement.
In a preferred technical solution of the foregoing solution, the step of "respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence" includes:
compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence;
wherein the sentence abstract model is a model constructed based on an attention mechanism.
In a preferred technical solution of the above-mentioned solution, the step of "compressing the first key sentence based on a sentence summarization model constructed in advance to obtain a corresponding second key sentence" includes:
acquiring unknown words generated when the first key sentence is compressed;
acquiring the word with the highest attention value at the generation time of the unknown word and replacing the unknown word with the acquired word with the highest attention value.
In a preferred technical solution of the foregoing, before the step of "respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence", the method further includes:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement.
In a preferred embodiment of the foregoing solution, the step of generating the summary of the document according to all the first to-be-synthesized key sentences includes:
restoring the labels in the first key sentence to be synthesized into corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized;
and generating the abstract of the document according to the second key sentence to be synthesized.
Compared with the closest prior art, the technical scheme at least has the following beneficial effects:
1. the extraction generation hybrid abstract generation method provided by the invention can extract the first key sentence through an extraction document abstract extraction method, compress the first key sentence to obtain the second key sentence, selectively take the first key sentence or the second key sentence as the first key sentence to be synthesized according to the comparison result of the length of the first key sentence and a preset length threshold value, and generate the document abstract according to the first key sentence to be synthesized.
2. The method for generating the hybrid abstract by extracting and generating can judge whether the length of the first key sentence is larger than or equal to a preset length threshold value, if so, the second key sentence corresponding to the first key sentence is used as the first key sentence to be synthesized, and if not, the first key sentence is directly used as the first key sentence to be synthesized, so that a more robust abstract can be obtained subsequently, namely, the readability is ensured to the greatest extent while the fact has a certain degree of fidelity.
3. The extraction generation hybrid abstract generation method provided by the invention can extract the first key sentence from the document by the extraction type document abstract extraction method, and can filter some text contents which are not important, so that the abstract of the document can be quickly generated by the generation type automatic abstract method at the later stage, and the high-precision document abstract can be obtained.
Drawings
Fig. 1 is a schematic diagram illustrating the main steps of a hybrid abstract generating method according to an embodiment of the present invention;
fig. 2 is a schematic main framework diagram of a hybrid abstract generating method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, fig. 1 illustrates the main steps of the hybrid abstract generation method in the present embodiment. As shown in fig. 1, the method for generating a hybrid abstract in this embodiment includes the following steps:
step S101: entities and numbers in the document are identified and replaced with preset tags.
Inspired by the manual summarization process (i.e. extracting some important sentences from the original text and then inductively rewriting the sentences), the invention generates the text summary of the long text by the extraction generation hybrid summary generation method. The method of the invention not only can filter some text contents which are not important by using the extraction type document abstract extraction method, but also can keep the fluency of the text abstract generated by the generation type document abstract extraction method. The extraction generation mixed abstract generation method mainly comprises two parts: extracting important sentences in the document and compressing and rewriting the extracted sentences.
Specifically, entities and numbers in the document may be identified and replaced with preset tags. Assuming that given an input document:
It’s just an example for illustration.There are 56nationalities in China.
the document with entities and numbers in the document replaced by preset tags is as follows:
It’s just an example for illustration.There are number-1nationalities in entity-1.
the named entities can be personal names, organization names, place names and other entities identified by names, the broader entities can also comprise numbers, dates, currency, addresses and the like, the entities and the numbers in the documents can be identified by a named entity identification tool space, and the entities and the numbers in the documents can be replaced by preset labels through a Python regular expression.
Step S102: and extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method.
The extraction type document abstract extraction method can extract some representative text segments from an original document to form an abstract, wherein the segments can be sentences, paragraphs or sections in the whole document. Specifically, a plurality of first key sentences may be extracted from the document after the tag replacement by using an extraction-type document abstract extraction method based on a Submodular function, original key sentences corresponding to the first key sentences in the document before the tag replacement are obtained, and the corresponding first key sentences are sorted according to the sorting order of each original key sentence in the document before the tag replacement. The total vocabulary number of the first key sentences is smaller than a preset threshold vocabulary number, which may be 200.
Step S103: and respectively compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence.
Although a plurality of first key sentences extracted by the extraction type document abstract extraction method can filter some text contents which are not important, the obtained abstract has low precision, and in order that the generated abstract can better accord with the expression of document meaning, the abstract which is closer to manual writing is obtained, and the plurality of first key sentences can be compressed. Specifically, the first key sentence may be compressed based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence, where the sentence abstract model is a model constructed based on an attention mechanism.
The step of compressing the first key sentence based on the pre-constructed sentence abstract model to obtain the corresponding second key sentence comprises the following steps:
acquiring unknown words generated when the first key sentence is compressed;
the word with the highest attention value at the generation time of the unknown word is acquired and the unknown word is replaced with the acquired word with the highest attention value.
The sentence abstract model is a model constructed based on an attention mechanism, and can be attached to an Encoder-Decoder framework, wherein the framework can be regarded as a research mode in the field of deep learning, the Encoder encodes an input sentence, converts the input sentence into an intermediate semantic representation through nonlinear transformation, can understand the Encoder as an encoding end and can understand the Decode as a decoding end, the Decode generates words to be generated at a specific moment according to the intermediate semantic representation of the sentence and history information generated before, and when an unknown word appears in the sentence, the word with the highest attention value at the generation moment of the unknown word can be acquired and replaces the unknown word with the highest acquired attention value, so that the readability of the abstract is improved.
Before a plurality of first key sentences are compressed to obtain second key sentences, a sentence abstract model can be trained, and the specific steps are as follows:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement until the sentence abstract model converges, wherein the text data set can be a Gigaword data set.
Step S104: judging whether the length of the first key sentence is greater than or equal to a preset length threshold value, if so, executing a step S105; if not, go to step S106.
In order to obtain a more robust abstract, that is, while ensuring a certain degree of fidelity to the fact, readability is ensured as much as possible, it may be determined whether the length of the first key sentence is greater than or equal to a preset length threshold, and corresponding operations are performed according to the determination result.
Step S105: and taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized.
If the length of the first key sentence is greater than or equal to the preset length threshold, in order to control the number of the finally generated abstract words to be controlled at a reasonable length and improve readability, a second key sentence corresponding to the first key sentence can be used as the first key sentence to be synthesized.
Step S106: and directly taking the first key sentence as the first key sentence to be synthesized.
If the length of the first key sentence is smaller than the preset length threshold, the first key sentence extracted from the document can be considered to meet the requirement of the vocabulary quantity of the final generated abstract, and the first key sentence is directly used as the first key sentence to be synthesized.
Step S107: and generating an abstract of the document according to all the first key sentences to be synthesized.
Specifically, the tags in the first key sentence may be reduced to corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized, and the second key sentence to be synthesized is arranged in order according to the sequence of the original sentences in the document corresponding to the second key sentence to be synthesized to generate the abstract of the document.
Referring to the attached table 1, the attached table 1 exemplarily shows the route value of the hybrid abstract generation method and the sequence-to-sequence attention (S2S + attn) model in the CNN/DailyMail data set (randomly extracting 100 documents as test data). The sentence-title training data set comprises 3,803,957 data pairs, the verification data set comprises 189,651 data pairs, and the test data set comprises 1951 data pairs, and as can be seen from the attached table 1, the extraction generation hybrid abstract generation method of the embodiment can significantly improve two indexes, namely, the route-1 and the route-L. In addition, the sentence abstract model of the embodiment is trained on a Gigaword data set by means of the idea of migration learning, while the existing S2S + attn model is trained on a CNN/Daily Mail data set, so that the model of the embodiment of the invention has better migration.
Attached Table 1 comparison of the present invention with ROUGE values based on a sequence-to-sequence model (S2S + attn)
Referring to fig. 2, fig. 2 illustrates the main framework of the hybrid abstract generation method in the present embodiment. As shown in fig. 2, the main framework of the hybrid abstract generating method in this embodiment is as follows:
firstly, extracting an original document to obtain a plurality of first key sentences, then compressing the first key sentences through a pre-constructed sentence abstract model to obtain corresponding second key sentences, and finally, selectively taking the first key sentences or the second key sentences as first key sentences to be synthesized according to a comparison result of the length of the first key sentences and a preset length threshold value, and generating the abstract of the document according to all the first key sentences to be synthesized.
The extraction generation hybrid abstract generation method provided by the invention combines the advantages of an extraction type document abstract extraction method and a generation type document abstract extraction method, can generate an abstract which accords with document semantic expression, can also ensure readability, can extract a first key sentence from a document by the extraction type document abstract extraction method, can filter some text contents which are not important, so that the abstract of the document can be quickly generated by the generation type automatic abstract method at a later stage, and the high-precision document abstract can be obtained.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing or implying any particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (6)
1. A method for generating a hybrid abstract by extraction generation is characterized by comprising the following steps:
identifying entities and numbers in a document and replacing the entities and numbers in the document with preset tags;
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method;
compressing the plurality of first key sentences respectively to obtain a second key sentence corresponding to each first key sentence;
judging whether the length of the first key sentence is greater than or equal to a preset length threshold value: if so, taking a second key sentence corresponding to the first key sentence as a first key sentence to be synthesized; if not, directly taking the first key sentence as the first key sentence to be synthesized;
and generating the abstract of the document according to all the first key sentences to be synthesized.
2. The abstract generating hybrid abstract generating method as claimed in claim 1, wherein the step of extracting a plurality of first key sentences from the document after the tag replacement by the abstract document abstract extracting method comprises:
extracting a plurality of first key sentences from the document subjected to label replacement by using an extraction type document abstract extraction method based on a Submodular function;
acquiring an original key sentence corresponding to the first key sentence in the document before the label replacement;
and sequencing the corresponding first key sentences according to the sequencing sequence of each original key sentence in the document before the label replacement.
3. The method for generating a hybrid abstract according to claim 1, wherein the step of compressing the plurality of first key sentences to obtain a second key sentence corresponding to each first key sentence comprises:
compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence;
wherein the sentence abstract model is a model constructed based on an attention mechanism.
4. The method for abstract-generating hybrid abstract of claim 3, wherein the step of compressing the first key sentence based on a pre-constructed sentence abstract model to obtain a corresponding second key sentence comprises:
acquiring unknown words generated when the first key sentence is compressed;
acquiring the word with the highest attention value at the generation time of the unknown word and replacing the unknown word with the acquired word with the highest attention value.
5. The method for generating hybrid abstract as claimed in claim 4, wherein before the step of compressing the plurality of first key sentences respectively to obtain the second key sentence corresponding to each first key sentence, the method further comprises:
identifying entities and numbers in a preset text data set;
replacing entities and numbers in the text data set by using a preset label;
and performing model training on the sentence abstract model according to the text data set subjected to label replacement.
6. The abstract generating hybrid type abstract generating method of any one of claims 1 to 5, wherein the step of generating the abstract of the document according to all the first key sentences to be synthesized comprises:
restoring the labels in the first key sentence to be synthesized into corresponding entities and numbers to obtain a corresponding second key sentence to be synthesized;
and generating the abstract of the document according to the second key sentence to be synthesized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811238086.6A CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811238086.6A CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109597886A CN109597886A (en) | 2019-04-09 |
CN109597886B true CN109597886B (en) | 2021-07-06 |
Family
ID=65957961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811238086.6A Active CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109597886B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119444B (en) * | 2019-04-23 | 2023-06-30 | 中电科大数据研究院有限公司 | Drawing type and generating type combined document abstract generating model |
CN111026861B (en) * | 2019-12-10 | 2023-07-04 | 腾讯科技(深圳)有限公司 | Text abstract generation method, training device, training equipment and medium |
CN113011160A (en) * | 2019-12-19 | 2021-06-22 | 中国移动通信有限公司研究院 | Text abstract generation method, device, equipment and storage medium |
CN111581358B (en) * | 2020-04-08 | 2023-08-18 | 北京百度网讯科技有限公司 | Information extraction method and device and electronic equipment |
CN111858913A (en) * | 2020-07-08 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Method and system for automatically generating text abstract |
CN112732901B (en) * | 2021-01-15 | 2024-05-28 | 联想(北京)有限公司 | Digest generation method, digest generation device, computer-readable storage medium, and electronic device |
CN113032552B (en) * | 2021-05-25 | 2021-08-27 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN113836892B (en) * | 2021-09-08 | 2023-08-08 | 灵犀量子(北京)医疗科技有限公司 | Sample size data extraction method and device, electronic equipment and storage medium |
CN116205234A (en) * | 2023-04-24 | 2023-06-02 | 中国电子科技集团公司第二十八研究所 | Text recognition and generation algorithm based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1609845A (en) * | 2003-10-22 | 2005-04-27 | 国际商业机器公司 | Method and apparatus for improving readability of automatic generated abstract by machine |
CN104503958A (en) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | Method and device for generating document summarization |
CN108228541A (en) * | 2016-12-22 | 2018-06-29 | 深圳市北科瑞声科技股份有限公司 | The method and apparatus for generating documentation summary |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8930376B2 (en) * | 2008-02-15 | 2015-01-06 | Yahoo! Inc. | Search result abstract quality using community metadata |
-
2018
- 2018-10-23 CN CN201811238086.6A patent/CN109597886B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1609845A (en) * | 2003-10-22 | 2005-04-27 | 国际商业机器公司 | Method and apparatus for improving readability of automatic generated abstract by machine |
CN104503958A (en) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | Method and device for generating document summarization |
CN108228541A (en) * | 2016-12-22 | 2018-06-29 | 深圳市北科瑞声科技股份有限公司 | The method and apparatus for generating documentation summary |
Non-Patent Citations (3)
Title |
---|
Internet上文本的自动摘要技术;尹存燕等;《计算机工程》;20060228;第32卷(第3期);第88-90页 * |
Pointing the Unknown Words;Caglar Gulcehre等;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160812;第140-149页 * |
抽取式文档摘要方法(一);仲夏199603;《https://www.pianshen.com/article/52201321841/it610》;20171128;第1-7页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109597886A (en) | 2019-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109597886B (en) | Extraction generation mixed abstract generation method | |
CN107301244B (en) | Method, apparatus, system and the trade mark memory of a kind of trade mark point card processing | |
CN109933796B (en) | Method and device for extracting key information of bulletin text | |
CN113961685A (en) | Information extraction method and device | |
CN107368474A (en) | A kind of automatical and efficient translation conversion method of Chinese to braille | |
CN111859919A (en) | Text error correction model training method and device, electronic equipment and storage medium | |
CN104199871A (en) | High-speed test question inputting method for intelligent teaching | |
CN110516203B (en) | Dispute focus analysis method, device, electronic equipment and computer-readable medium | |
CN112016320A (en) | English punctuation adding method, system and equipment based on data enhancement | |
CN112686044A (en) | Medical entity zero sample classification method based on language model | |
CN111563372B (en) | Typesetting document content self-duplication checking method based on teaching book publishing | |
CN105488471B (en) | A kind of font recognition methods and device | |
CN112749283A (en) | Entity relationship joint extraction method for legal field | |
Volk et al. | Nunc profana tractemus. Detecting code-switching in a large corpus of 16th century letters | |
CN113779345B (en) | Teaching material generation method and device, computer equipment and storage medium | |
CN114239554A (en) | Text sentence-breaking method, text sentence-breaking training device, electronic equipment and storage medium | |
CN111291569B (en) | Training method and device for multi-class entity recognition model | |
Alkhazi et al. | Classifying and segmenting classical and modern standard Arabic using minimum cross-entropy | |
CN112036330A (en) | Text recognition method, text recognition device and readable storage medium | |
Marcińczuk et al. | Structure annotation in the polish corpus of suicide notes | |
Bień | The IMPACT project Polish Ground-Truth texts as a DjVu corpus | |
CN112580303A (en) | Punctuation adding system | |
Aranta et al. | Utilization Of Hexadecimal Numbers In Optimization Of Balinese Transliteration String Replacement Method | |
CN110888976B (en) | Text abstract generation method and device | |
CN110889289B (en) | Information accuracy evaluation method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |