CN109597886A - It extracts and generates mixed type abstraction generating method - Google Patents
It extracts and generates mixed type abstraction generating method Download PDFInfo
- Publication number
- CN109597886A CN109597886A CN201811238086.6A CN201811238086A CN109597886A CN 109597886 A CN109597886 A CN 109597886A CN 201811238086 A CN201811238086 A CN 201811238086A CN 109597886 A CN109597886 A CN 109597886A
- Authority
- CN
- China
- Prior art keywords
- sentence
- critical
- document
- critical sentence
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 239000000284 extract Substances 0.000 title claims description 11
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Abstract
The invention belongs to natural language fields, specifically provide a kind of extraction generation mixed type abstraction generating method, it is intended to solve the problems, such as that existing extraction-type auto-abstracting method and production auto-abstracting method exist.The present invention provides a kind of extractions to generate mixed type abstraction generating method, including the entity and number in identification document and utilizes the entity and number in preset tag replacement document;Multiple first critical sentences are extracted in the document after carrying out tag replacement using extraction-type documentation summary abstracting method;Multiple first critical sentences are compressed respectively to obtain corresponding second critical sentence of each first critical sentence;By the comparison result of the length of the first critical sentence and preset length threshold, the property of can choose using the first critical sentence or the second critical sentence as the first critical sentence to be synthesized;The abstract of document is generated according to all first critical sentences to be synthesized.The abstract for meeting document semantic expression had both can be generated in method provided by the invention, it can also be ensured that readable.
Description
Technical field
The invention belongs to natural language technical fields, and in particular to a kind of extraction generation mixed type abstraction generating method.
Background technique
Autoabstract is to realize text analyzing, the skill that content is concluded and abstract automatically generates automatically using computer system
Art can in brief be expressed the main contents of original text by the requirement of reader (or user).Autoabstract technology can have
Effect ground helps reader (or user) to find interested content from the article retrieved, improves reading rate and quality.The skill
Art can be more succinct by document boil down to expression, and guarantee cover the valuable theme of original document.
Existing autoabstract technology mainly includes two methods: extraction-type auto-abstracting method and production autoabstract
Method.Extraction-type auto-abstracting method is that the segment extracted from document is formed to digest, and implementation method is simple, readable good
It is good, but obtained abstract precision is not high;Production auto-abstracting method is that abstract is generated directly from document expression of significance, difficult
Degree is big, but the essence of closer abstract.
Therefore, how to propose it is a kind of can both filter unessential content of text in document, retain the fluency of abstract, again
The scheme that the precision of abstract can be improved is the current problem to be solved of those skilled in the art.
Summary of the invention
In order to solve the above problem in the prior art, in order to solve existing extraction-type auto-abstracting method and generation
Formula auto-abstracting method there are the problem of, the present invention provides a kind of extraction generate mixed type abstraction generating method, comprising:
It identifies the entity and number in document and utilizes the entity and number in document described in preset tag replacement;
Multiple first critical sentences are extracted in the document after carrying out tag replacement using extraction-type documentation summary abstracting method;
The multiple first critical sentence is compressed respectively to obtain corresponding second key of each first critical sentence
Sentence;
Judge whether the length of first critical sentence is more than or equal to preset length threshold: if so, by described first
Corresponding second critical sentence of critical sentence is as the first critical sentence to be synthesized;If it is not, then directly using first critical sentence as institute
State the first critical sentence to be synthesized;
The abstract of the document is generated according to all first critical sentences to be synthesized.
In the optimal technical scheme of above scheme, " using extraction-type documentation summary abstracting method from carry out tag replacement
Multiple first critical sentences are extracted in document afterwards " the step of include:
Utilize document of the extraction-type documentation summary abstracting method based on Submodular function after carrying out tag replacement
It is middle to extract multiple first critical sentences;
Obtain the original critical sentence corresponding with first critical sentence in carrying out the document before tag replacement;
According to collating sequence of each former critical sentence in the document before the progress tag replacement to corresponding the
The sequence of one critical sentence.
In the optimal technical scheme of above scheme, " respectively the multiple first critical sentence is compressed to obtain each
The step of corresponding second critical sentence of first critical sentence " includes:
First critical sentence is compressed based on the sentence abstract model constructed in advance to obtain corresponding second key
Sentence;
Wherein, the sentence abstract model is the model based on attention mechanism construction.
It is " crucial to described first based on the sentence abstract model constructed in advance in the optimal technical scheme of above scheme
Sentence compressed to obtain corresponding second critical sentence " the step of include:
Obtain the unregistered word generated when compressing to first critical sentence;
It obtains and pays attention to the highest word of force value at the generation moment of the unregistered word and utilize acquired attention force value
Highest word replaces the unregistered word.
In the optimal technical scheme of above scheme, " respectively the multiple first critical sentence is compressed to obtain it is every
Before the step of corresponding second critical sentence of a first critical sentence ", the method also includes:
Identify the entity and number that preset text data is concentrated;
The entity and number concentrated using text data described in preset tag replacement;
Model training is carried out to sentence abstract model according to the text data set after progress tag replacement.
In the optimal technical scheme of above scheme, " plucking for the document is generated according to all first critical sentences to be synthesized
Want " the step of include:
Label in described first critical sentence to be synthesized is reduced to corresponding entity and number, obtain corresponding second to
Synthesize critical sentence;
The abstract of the document is generated according to the described second critical sentence to be synthesized.
Compared with the immediate prior art, above-mentioned technical proposal is at least had the following beneficial effects:
1, extraction provided by the invention generates mixed type abstraction generating method, can pass through extraction-type documentation summary extraction side
Method extracts the first critical sentence, and the first critical sentence is compressed to obtain the second critical sentence, by the length of the first critical sentence and pre-
If length threshold comparison result, the property of can choose using the first critical sentence or the second critical sentence as the first pass to be synthesized
Key sentence generates documentation summary according to the first critical sentence to be synthesized, combines extraction-type documentation summary abstracting method and production text
The advantages of shelves abstract abstracting method, the abstract for meeting document semantic expression both can be generated, it can also be ensured that readable.
2, extraction provided by the invention generates mixed type abstraction generating method, it can be determined that whether the length of the first critical sentence
More than or equal to preset length threshold, if so, using corresponding second critical sentence of the first critical sentence as the first key to be synthesized
Sentence, if it is not, directly using the first critical sentence as the first critical sentence to be synthesized, so that subsequent available one more robust is plucked
It wants, that is, while guaranteeing that there is a degree of informativeness to the fact, ensures readability as far as possible.
3, extraction provided by the invention generates mixed type abstraction generating method, can first pass through the extraction of extraction-type documentation summary
Method extracts the first critical sentence from document, some not too important content of text can be filtered, so that the later period passes through production
Auto-abstracting method is quickly generated the abstract of document, obtains high-precision documentation summary.
Detailed description of the invention
Fig. 1 is that the extraction of an embodiment of the present invention generates the key step schematic diagram of mixed type abstraction generating method;
Fig. 2 is that the extraction of an embodiment of the present invention generates the major architectural schematic diagram of mixed type abstraction generating method.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining technical principle of the invention, it is not intended that limit the scope of the invention.
Refering to attached drawing 1, Fig. 1, which illustratively gives to extract in the present embodiment, generates the main of mixed type abstraction generating method
Step.Include the following steps: as shown in Figure 1, being extracted in the present embodiment and generating mixed type abstraction generating method
Step S101: the entity and number in document are identified and utilizes the entity sum number in preset tag replacement document
Word.
It (is first extracted some important sentences from original text then to carry out these sentences by digest procedure is manually write
Conclude rewrite) inspiration, the present invention by extract generate mixed type abstraction generating method generate long text text snippet.This hair
Bright method both can use extraction-type documentation summary abstracting method and filter some not too important content of text, while can be with
Retain the fluency that production documentation summary abstracting method generates text snippet.Extraction of the invention generates mixed type summarization generation
Method mainly consists of two parts: important sentence and carrying out compression rewriting to the sentence of extraction in abstracting document.
Specifically, the entity and number in document can be identified and utilize the entity in preset tag replacement document
And number.Assuming that giving an input document:
It’s just an example for illustration.There are 56nationalities in
China.
Utilize the document after preset tag replacement as follows with number the entity in document:
It’s just an example for illustration.There are number-1nationalities
in entity-1.
Wherein, n presentation-entity and number name entity can respectively in the sequence of the entity set of original document and digital convergence
To be name, mechanism name, place name and other all entities with entitled mark, wider entity can also include number
Word, date, currency and address etc. can identify entity and number in document by name Entity recognition tool Spacy, can
To utilize the entity and number in preset tag replacement document by Python regular expression.
Step S102: multiple the are extracted in document after carrying out tag replacement using extraction-type documentation summary abstracting method
One critical sentence.
Extraction-type documentation summary abstracting method can extract some representative text fragments structures from original document
At abstract, these segments can be sentence, paragraph or trifle in entire document.Specifically, it can use and be based on
The extraction-type documentation summary abstracting method of Submodular function extracts multiple first and closes in the document after carrying out tag replacement
Key sentence obtains the original critical sentence corresponding with the first critical sentence in carrying out the document before tag replacement, according to each former critical sentence
It sorts carrying out the collating sequence in the document before tag replacement to corresponding first critical sentence.Wherein, multiple first critical sentences
Vocabulary sum be less than preset vocabulary amount threshold, vocabulary amount threshold can be 200.
Step S103: multiple first critical sentences are compressed to obtain corresponding second key of each first critical sentence respectively
Sentence.
Although by multiple first critical sentences that extraction-type documentation summary abstracting method extracts can filter it is some not
Too important content of text, but obtained abstract precision is not high, in order to which the abstract of generation can more meet the table of document meaning
It reaches, obtains to compress multiple first critical sentences closer to the abstract manually write.It specifically, can be based on preparatory
The sentence abstract model of building compresses the first critical sentence, obtains corresponding second critical sentence, wherein sentence abstract model
It is the model based on attention mechanism construction.
" the first critical sentence is compressed to obtain corresponding second critical sentence based on the sentence abstract model constructed in advance "
The step of include:
Obtain the unregistered word generated when compressing to the first critical sentence;
It obtains and pays attention to the highest word of force value at the generation moment of unregistered word and utilize acquired attention force value highest
Word replace unregistered word.
Sentence abstract model is the model based on attention mechanism construction, which can be attached to Encoder-
Under Decoder frame, which can be regarded as a kind of research mode in deep learning field, and Encoder is the sentence to input
Son is encoded, and is converted intermediate semantic expressiveness by nonlinear transformation for the sentence of input, Encoder can be interpreted as compiling
Decoder, can be interpreted as decoding end by code end, and Decoder is to have generated according to the intermediate semantic expressiveness of sentence and before
Historical information generate the particular moment word to be generated, it is when occurring unregistered word in sentence, available to be not logged in
The generation moment of word pays attention to the highest word of force value and using the acquired highest word replacement unregistered word of attention force value, improves
The readability of abstract.
Before being compressed to obtain the second critical sentence to multiple first critical sentences, the model that can also make a summary to sentence is carried out
Training, specific steps are as follows:
Identify the entity and number that preset text data is concentrated;
The entity and number concentrated using preset tag replacement text data;
Model training is carried out to sentence abstract model according to the text data set after progress tag replacement, until sentence is made a summary
Model convergence, wherein text data set can be Gigaword data set.
Step S104: judging whether the length of the first critical sentence is more than or equal to preset length threshold, if so, executing step
Rapid S105;If it is not, thening follow the steps S106.
The abstract of a more robust is use up while guarantee to the fact with a degree of informativeness in order to obtain
Amount ensures readability, it can be determined that whether the length of the first critical sentence is more than or equal to preset length threshold, according to judging result
Execute corresponding operation.
Step S105: using corresponding second critical sentence of the first critical sentence as the first critical sentence to be synthesized.
If the length of the first critical sentence is more than or equal to preset length threshold, in order to control the abstract vocabulary number ultimately generated
Amount control is in reasonable length and improves readability, can be using corresponding second critical sentence of the first critical sentence as first wait close
At critical sentence.
Step S106: directly using the first critical sentence as the first critical sentence to be synthesized.
If the length of the first critical sentence is less than preset length threshold, it may be considered that first extracted from document is crucial
Sentence meets the vocabulary quantitative requirement for ultimately generating abstract, directly using the first critical sentence as the first critical sentence to be synthesized.
Step S107: the abstract of document is generated according to all first critical sentences to be synthesized.
Specifically, the label in the first critical sentence can be reduced to corresponding entity and number, obtains corresponding second
It is to be synthesized to be arranged in order second according to the sequence of the corresponding document Central Plains sentence of the second critical sentence to be synthesized for critical sentence to be synthesized
Critical sentence generates the abstract of document.
Refering to subordinate list 1, subordinate list 1 illustratively gives the present embodiment and extracts generation mixed type abstraction generating method and be based on
Attention (S2S+attn) model of sequence to sequence is (random to take out 100 documents as test in CNN/DailyMail data set
Data) ROUGE value.Sentence-title training dataset includes 3,803,957 data pair, and validation data set includes 189,
651 data pair, test data set include 1951 data pair, as can be seen that the extraction generation of the present embodiment is mixed from subordinate list 1
Mould assembly abstraction generating method can be obviously improved two indexs of ROUGE-1 and ROUGE-L.In addition, the sentence of the present embodiment is made a summary
Model is the training on Gigaword data set, by means of the thought of transfer learning, and existing S2S+attn model be
Training obtains on CNN/Daily Mail data set, and the model of the embodiment of the present invention has better migration.
1 present invention of subordinate list is compared with the ROUGE value based on sequence to series model (S2S+attn)
Refering to attached drawing 2, Fig. 2, which illustratively gives to extract in the present embodiment, generates the main of mixed type abstraction generating method
Frame.As shown in Fig. 2, the major architectural for extracting generation mixed type abstraction generating method in the present embodiment is as follows:
First by extracting to original document, multiple first critical sentences are obtained, then the sentence by constructing in advance
Abstract model compresses the first critical sentence to obtain corresponding second critical sentence, finally by the length of the first critical sentence and pre-
If length threshold comparison result, the property of can choose using the first critical sentence or the second critical sentence as the first pass to be synthesized
Key sentence generates the abstract of document according to all first critical sentences to be synthesized.
Extraction provided by the invention generates mixed type abstraction generating method, combine extraction-type documentation summary abstracting method and
The abstract for meeting document semantic expression had both can be generated, it can also be ensured that readable in the advantages of production documentation summary abstracting method
Property, extraction-type documentation summary abstracting method can be first passed through and extract the first critical sentence from document, can be filtered some less heavy
The content of text wanted obtains high-precision so that the later period is quickly generated by production auto-abstracting method the abstract of document
Documentation summary.
Although each step is described in the way of above-mentioned precedence in above-described embodiment, this field
Technical staff is appreciated that the effect in order to realize the present embodiment, executes between different steps not necessarily in such order,
It (parallel) execution simultaneously or can be executed with reverse order, these simple variations all protection scope of the present invention it
It is interior.
Those skilled in the art should be able to recognize that, side described in conjunction with the examples disclosed in the embodiments of the present disclosure
Method step, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate electronic hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is executed actually with electronic hardware or software mode, specific application and design constraint depending on technical solution.
Those skilled in the art can use different methods to achieve the described function each specific application, but this reality
Now it should not be considered as beyond the scope of the present invention.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, rather than be used to describe or indicate specific sequence or precedence.It should be understood that this
The data that sample uses can be interchanged in appropriate circumstances, so that the embodiment of the present invention described herein can be in addition at this
In illustrate or description those of other than sequence implement.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this
Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these
Technical solution after change or replacement will fall within the scope of protection of the present invention.
Claims (6)
1. a kind of extraction generates mixed type abstraction generating method, characterized by comprising:
It identifies the entity and number in document and utilizes the entity and number in document described in preset tag replacement;
Multiple first critical sentences are extracted in the document after carrying out tag replacement using extraction-type documentation summary abstracting method;
The multiple first critical sentence is compressed respectively to obtain corresponding second critical sentence of each first critical sentence;
Judge whether the length of first critical sentence is more than or equal to preset length threshold: if so, crucial by described first
Corresponding second critical sentence of sentence is as the first critical sentence to be synthesized;If it is not, then directly using first critical sentence as described
One critical sentence to be synthesized;
The abstract of the document is generated according to all first critical sentences to be synthesized.
2. extraction according to claim 1 generates mixed type abstraction generating method, which is characterized in that " utilize extraction-type text
Shelves abstract abstracting method from carry out tag replacement after document in extract multiple first critical sentences " the step of include:
It is taken out in the document after carrying out tag replacement using the extraction-type documentation summary abstracting method based on Submodular function
Take multiple first critical sentences;
Obtain the original critical sentence corresponding with first critical sentence in carrying out the document before tag replacement;
It is closed according to collating sequence of each former critical sentence in the document before the progress tag replacement to corresponding first
The sequence of key sentence.
3. extraction according to claim 1 generates mixed type abstraction generating method, which is characterized in that " respectively to described more
A first critical sentence is compressed to obtain corresponding second critical sentence of each first critical sentence " the step of include:
First critical sentence is compressed to obtain corresponding second critical sentence based on the sentence abstract model constructed in advance;
Wherein, the sentence abstract model is the model based on attention mechanism construction.
4. extraction according to claim 3 generates mixed type abstraction generating method, which is characterized in that " based on building in advance
Sentence abstract model first critical sentence is compressed to obtain corresponding second critical sentence " the step of include:
Obtain the unregistered word generated when compressing to first critical sentence;
It obtains and pays attention to the highest word of force value at the generation moment of the unregistered word and utilize acquired attention force value highest
Word replace the unregistered word.
5. extraction according to claim 4 generates mixed type abstraction generating method, which is characterized in that " respectively to described
Multiple first critical sentences are compressed to obtain corresponding second critical sentence of each first critical sentence " the step of before, it is described
Method further include:
Identify the entity and number that preset text data is concentrated;
The entity and number concentrated using text data described in preset tag replacement;
Model training is carried out to sentence abstract model according to the text data set after progress tag replacement.
6. extraction according to any one of claim 1 to 5 generates mixed type abstraction generating method, which is characterized in that " root
The abstract of the document is generated according to all first critical sentences to be synthesized " the step of include:
Label in described first critical sentence to be synthesized is reduced to corresponding entity and number, it is to be synthesized to obtain corresponding second
Critical sentence;
The abstract of the document is generated according to the described second critical sentence to be synthesized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811238086.6A CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811238086.6A CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109597886A true CN109597886A (en) | 2019-04-09 |
CN109597886B CN109597886B (en) | 2021-07-06 |
Family
ID=65957961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811238086.6A Active CN109597886B (en) | 2018-10-23 | 2018-10-23 | Extraction generation mixed abstract generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109597886B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119444A (en) * | 2019-04-23 | 2019-08-13 | 中电科大数据研究院有限公司 | A kind of official document summarization generation model that extraction-type is combined with production |
CN111026861A (en) * | 2019-12-10 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Text abstract generation method, text abstract training method, text abstract generation device, text abstract training device, text abstract equipment and text abstract training medium |
CN111581358A (en) * | 2020-04-08 | 2020-08-25 | 北京百度网讯科技有限公司 | Information extraction method and device and electronic equipment |
CN111858913A (en) * | 2020-07-08 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Method and system for automatically generating text abstract |
CN112732901A (en) * | 2021-01-15 | 2021-04-30 | 联想(北京)有限公司 | Abstract generation method and device, computer readable storage medium and electronic equipment |
CN113011160A (en) * | 2019-12-19 | 2021-06-22 | 中国移动通信有限公司研究院 | Text abstract generation method, device, equipment and storage medium |
CN113032552A (en) * | 2021-05-25 | 2021-06-25 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN113836892A (en) * | 2021-09-08 | 2021-12-24 | 灵犀量子(北京)医疗科技有限公司 | Sample size data extraction method and device, electronic equipment and storage medium |
CN116205234A (en) * | 2023-04-24 | 2023-06-02 | 中国电子科技集团公司第二十八研究所 | Text recognition and generation algorithm based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1609845A (en) * | 2003-10-22 | 2005-04-27 | 国际商业机器公司 | Method and apparatus for improving readability of automatic generated abstract by machine |
US20090210381A1 (en) * | 2008-02-15 | 2009-08-20 | Yahoo! Inc. | Search result abstract quality using community metadata |
CN104503958A (en) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | Method and device for generating document summarization |
CN108228541A (en) * | 2016-12-22 | 2018-06-29 | 深圳市北科瑞声科技股份有限公司 | The method and apparatus for generating documentation summary |
-
2018
- 2018-10-23 CN CN201811238086.6A patent/CN109597886B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1609845A (en) * | 2003-10-22 | 2005-04-27 | 国际商业机器公司 | Method and apparatus for improving readability of automatic generated abstract by machine |
US20090210381A1 (en) * | 2008-02-15 | 2009-08-20 | Yahoo! Inc. | Search result abstract quality using community metadata |
CN104503958A (en) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | Method and device for generating document summarization |
CN108228541A (en) * | 2016-12-22 | 2018-06-29 | 深圳市北科瑞声科技股份有限公司 | The method and apparatus for generating documentation summary |
Non-Patent Citations (3)
Title |
---|
CAGLAR GULCEHRE等: "Pointing the Unknown Words", 《PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
仲夏199603: "抽取式文档摘要方法(一)", 《HTTPS://WWW.PIANSHEN.COM/ARTICLE/52201321841/IT610》 * |
尹存燕等: "Internet上文本的自动摘要技术", 《计算机工程》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119444B (en) * | 2019-04-23 | 2023-06-30 | 中电科大数据研究院有限公司 | Drawing type and generating type combined document abstract generating model |
CN110119444A (en) * | 2019-04-23 | 2019-08-13 | 中电科大数据研究院有限公司 | A kind of official document summarization generation model that extraction-type is combined with production |
CN111026861A (en) * | 2019-12-10 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Text abstract generation method, text abstract training method, text abstract generation device, text abstract training device, text abstract equipment and text abstract training medium |
CN111026861B (en) * | 2019-12-10 | 2023-07-04 | 腾讯科技(深圳)有限公司 | Text abstract generation method, training device, training equipment and medium |
CN113011160A (en) * | 2019-12-19 | 2021-06-22 | 中国移动通信有限公司研究院 | Text abstract generation method, device, equipment and storage medium |
CN111581358A (en) * | 2020-04-08 | 2020-08-25 | 北京百度网讯科技有限公司 | Information extraction method and device and electronic equipment |
CN111581358B (en) * | 2020-04-08 | 2023-08-18 | 北京百度网讯科技有限公司 | Information extraction method and device and electronic equipment |
CN111858913A (en) * | 2020-07-08 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Method and system for automatically generating text abstract |
CN112732901A (en) * | 2021-01-15 | 2021-04-30 | 联想(北京)有限公司 | Abstract generation method and device, computer readable storage medium and electronic equipment |
CN113032552A (en) * | 2021-05-25 | 2021-06-25 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN113032552B (en) * | 2021-05-25 | 2021-08-27 | 南京鸿程信息科技有限公司 | Text abstract-based policy key point extraction method and system |
CN113836892A (en) * | 2021-09-08 | 2021-12-24 | 灵犀量子(北京)医疗科技有限公司 | Sample size data extraction method and device, electronic equipment and storage medium |
CN113836892B (en) * | 2021-09-08 | 2023-08-08 | 灵犀量子(北京)医疗科技有限公司 | Sample size data extraction method and device, electronic equipment and storage medium |
CN116205234A (en) * | 2023-04-24 | 2023-06-02 | 中国电子科技集团公司第二十八研究所 | Text recognition and generation algorithm based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN109597886B (en) | 2021-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109597886A (en) | It extracts and generates mixed type abstraction generating method | |
CN106919673B (en) | Text mood analysis system based on deep learning | |
CN110134953B (en) | Traditional Chinese medicine named entity recognition method and recognition system based on traditional Chinese medicine ancient book literature | |
CN108763483A (en) | A kind of Text Information Extraction method towards judgement document | |
CN109684648A (en) | A kind of Chinese automatic translating method at all times of multiple features fusion | |
CN105243129A (en) | Commodity property characteristic word clustering method | |
CN106407235B (en) | A kind of semantic dictionary construction method based on comment data | |
CN104199871A (en) | High-speed test question inputting method for intelligent teaching | |
CN107247739B (en) | A kind of financial bulletin text knowledge extracting method based on factor graph | |
CN111695346B (en) | Method for improving public opinion entity recognition rate in financial risk prevention and control field | |
CN109933796A (en) | A kind of bulletin text key message extracting method and equipment | |
CN107368474A (en) | A kind of automatical and efficient translation conversion method of Chinese to braille | |
CN110046356A (en) | Label is embedded in the application study in the classification of microblogging text mood multi-tag | |
CN110110087A (en) | A kind of Feature Engineering method for Law Text classification based on two classifiers | |
CN107436931B (en) | Webpage text extraction method and device | |
CN106570133A (en) | Method and device for constructing visual webpage information extracting rule | |
CN111178047B (en) | Ancient medical record prescription extraction method based on hierarchical sequence labeling | |
CN116737924A (en) | Medical text data processing method and device | |
CN113268714B (en) | Automatic extraction method for license terms of open source software | |
Al-Sultany et al. | Enriching tweets for topic modeling via linking to the wikipedia | |
CN110516069B (en) | Fasttext-CRF-based quotation metadata extraction method | |
CN109857746A (en) | Automatic update method, device and the electronic equipment of bilingual word bank | |
CN114722829A (en) | Automatic generation method of ancient poems based on language model | |
CN113990421A (en) | Electronic medical record named entity identification method based on data enhancement | |
CN109918622A (en) | The method and system converted from Word document to LaTeX document are realized based on JAVA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |