CN107526841A - A kind of Tibetan language text summarization generation method based on Web - Google Patents
A kind of Tibetan language text summarization generation method based on Web Download PDFInfo
- Publication number
- CN107526841A CN107526841A CN201710847326.1A CN201710847326A CN107526841A CN 107526841 A CN107526841 A CN 107526841A CN 201710847326 A CN201710847326 A CN 201710847326A CN 107526841 A CN107526841 A CN 107526841A
- Authority
- CN
- China
- Prior art keywords
- sentence
- article
- text
- vocabulary
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of Tibetan language text summarization generation method based on Web, comprise the following steps:Go to match the sentence in article original text by thesaurus, and the weight of sentence is calculated;It is ranked up according to sentence weight, chooses the percentage of sentences in article sum as summary sentence;The sentence of extraction is resequenced according to order of the sentence in original text, sentence is subjected to splicing generation summary.The present invention, which proposes the present invention and proposed, takes extracts formula mode to carry out autoabstract generation method, it is a number of sentence composition summary that can most represent text subject thought of selection, it is effective to be convenient for people to obtain Tibetan information, while improve the efficiency that people obtain information.
Description
Technical field
The present invention relates to field of information processing, more particularly to a kind of Tibetan language text summarization generation method based on Web.
Background technology
Digest is for the purpose of providing content outline, is not added with comment and additional explanation, simplicity, definitely describes document
The short essay of important content.Autoabstract technology is that text is analyzed using computer, and therefrom selection reflects text subject
Content is extracted so as to form summary.Summary can be divided into different classifications according to different criteria for classifications.According in summary
Summary can be divided into extracts formula summary and production two classes of summary by holding the contrast with source document.Extracts formula summary texts are by from source
The sentence extracted in text is formed, and production summary establishes the generation made a summary on the basis of discourse semanticses are understood,
The not all content of summary texts is all in source text.And summary can be divided into Dan Wen according to the amount of text of summary source text
This summary and more text snippets.Single text snippet refers to only carry out abstract extraction to a source text, and more text snippets are pair
Multiple source documents carry out comprehensive property summarization generation under same subject.Single text is made a summary using extracts formula method herein
Extraction.
Now, the extraction for having Many researchers centering english abstract has carried out substantial amounts of research.From the research of autoabstract
From the point of view of on strategy, the research of autoabstract can be divided into three periods:Mechanical summary period, understand summary period and comprehensive summary
Period.Machinery summary is not limited by field, and method is simple, but summary quality will not be too high;And it is a certain to understand that summary is only used for
Individual small field and quality height, but difficulty is larger, is not easy to realize;Therefore, people start gradually to pay attention to integrating a variety of methods
To extract summary, have complementary advantages to realize, and then have the appearance of comprehensive summary.
The external research for autoabstract technology is started from 1958, and U.S. IBM H.P.Luhn has been carried out for the first time certainly
Dynamic digest experiment.Luhn uses the occurrence number of vocabulary as the weights of word, and then the sentence comprising these high frequency words is carried out
Marking, the high sentence of score is extracted as summary from text;P.E.Baxendale indicates sentence in section under study for action
Relation between the position fallen and this importance;There is researcher to propose the side based on NB Algorithm again afterwards
Method, the method based on decision Tree algorithms, method based on implicit Markov model etc., achieve certain effect.
The domestic research for automatic Summarization Technique is started late, with popularization of the computer in China, and during network
The demand that generation is handled information flow, the research of automatic Summarization Technique just gradually grow up in the 90's of 20th century.1988
Professor Wang Yongcheng of Shanghai Communications University starts Chinese literature automatic abstracting system, develops that " Chinese document is worked out automatically in succession
Digest pilot system ", " Automatic Abstracting System on Chinese Documents CAES " and 1997 year " OA Chinese literature autoabstracts system developed
System ".Wherein OA Automatic Abstracting System on Chinese Documents employs apery algorithm, and has considered position, referring expression, keyword
And many factors such as title, and the system is not limited by field, is a more practical system.In addition, Harbin is industrial
University professor Wang Kaizhu combines the machinery summary based on statistics and made a summary with the understanding based on meaning, develops HIT -97I type English
Literary automatic Summarization System.
Research for Tibetan language autoabstract is less at present, mainly have researcher peace see just allow proposition based on sentence
The weight of each Web sentences is mainly decomposed into Web Feature Words and Web sentences by the Web Text summarization methods of extraction, this method
Structure ratio, a number of sentence is then selected as summary according to sentence weights size.
With energetically support of the country to Tibetan areas information technology construction, the quantity of Tibetan language net is more and more, and this is
The research of Tibetan language text summarization technology provides substantial amounts of language material.On the other hand, what Tibetan web page was made a summary is extracted as Tibetan area
Informatization Development provides favourable technology, is brought convenience for the retrieval of Tibetan information, can allow one to soon judge
Whether in original text have interested content, people can be allowed to find the information oneself really needed quickly if going out, without by the time
It is wasted in the reading of uncorrelated document, greatly improves the efficiency that people obtain information, thus to social development and economic construction
There is certain use value.
The content of the invention
The research of Tibetan language autoabstract is less at present, helps people to obtain Tibetan information for convenience, and the present invention carries
Go out to take extracts formula mode to carry out autoabstract generation method, text subject thought can most be represented by being that selection is a number of
Sentence composition summary, is effectively convenient for people to obtain Tibetan information, while improves the efficiency that people obtain information.
To achieve the above object, the invention provides a kind of Tibetan language text summarization generation method based on Web, including
Following steps:Go to match the sentence in article original text by thesaurus, and the weight of sentence is calculated;Weighed according to sentence
It is ranked up again, chooses the percentage of sentences in article sum as summary sentence;By the sentence of extraction according to sentence in original text
Order is resequenced, and sentence is carried out into splicing generation summary.
Preferably, thesaurus is built by the following method:By counting the word frequency of article, a number of high frequency is selected
Vocabulary is added to candidate topics vocabulary;The article after participle is matched by field antistop list, by the keyword of matching
It is added to candidate topics vocabulary;Part vocabulary is extracted from article title and is added to candidate topics vocabulary;Finally according to theme
Word extraction algorithm extracts thesaurus from above three parts vocabulary.
Preferably, sentence Weight function is designed as
Wherein W (Sk) represent sentence SkWeights, Wp(Sk) represent sentence position weight, wkiThe weighting of high frequency vocabulary distich is represented,
wkjRepresent keyword to sentence SkWeighting, wkmRepresent text header in vocabulary to sentence SkWeighting, Wc(Sk) represent clue
Word is to sentence SkWeighting.
Preferably, to avoid clip Text from repeating, sentence novelty degree is calculated, sentence similarity calculation formula is:Wherein, Sim (Si,Sj) represent sentence SiWith sentence SjBetween it is similar
Degree.
Preferably, it is the cutting by carrying out word to the sentence in article original text, is obtained after removing nonsensical stop words
Take;Stop words is determined by filtering out high-frequency vocabulary.
Preferably, the sentence of extraction is resequenced according to order of the sentence in original text, sentence is spliced
Generation summary step includes:By filtering the sentence of redundancy, extraction summary sentence is resequenced in original text order by sentence, pieced together
It is used as summary.
Preferably, influenceing the factor of the sentence weight calculation includes:Word frequency, field keyword, title, position and clue
One or more of word.
The present invention carries out Tibetan language abstract extraction using extracts formula method to single text, and being that selection is a number of can most represent
The sentence composition summary of text subject thought, is effectively convenient for people to obtain Tibetan information, while improve people and obtain information
Efficiency.
Brief description of the drawings
Fig. 1 is a kind of autoabstract generation method schematic flow sheet provided in an embodiment of the present invention;
The test sample surface chart of Fig. 2 embodiment of the present invention;
Fig. 3 is abstract extraction surface chart of the embodiment of the present invention.
Embodiment
Below by drawings and examples, technical scheme is described in further detail.
Fig. 1 is a kind of Tibetan language text summarization generation method flow signal based on Web provided in an embodiment of the present invention
Figure.
As shown in figure 1, a kind of autoabstract generation method schematic flow sheet, specific steps include:
Step S110, go to match the sentence in article original text by thesaurus, and the weight of sentence is calculated.
Sentence is the base unit of language performance, and no matter for Chinese text or Tibetan language text, sentence is all that semanteme is patrolled
Collect structure, there is the minimum unit of complete syntax, the correlation between multiple semantic objects can be expressed.Therefore, select herein
The elementary cell that sentence extracts as text snippet.
For Chinese, punctuation mark is the symbol of supplementary text record instruction, for representing pause, the tone and word
Property and effect.Wherein period is used for representing the pause of different length, including fullstop (.), question mark (), exclamation mark (!), comma
(), pause mark (), branch (;) and colon (:).Other also have label and symbol these symbols that article is divided into many sentences,
Consequently facilitating the understanding of article meaning.And Tibetan language is different from Chinese, its primary symbols has:
The Tibetan language symbol of table 1
By carrying out statistical analysis to network text, mainly single line is used in network textCarry out the division of sentence.Therefore
Single line is used hereinThe cutting and extraction of sentence are carried out as separator.
The extraction for taking extraction-type mode to be made a summary herein, removable auto-abstracting method are usually to select a fixed number
The sentence that can most the represent text subject thought composition summary of amount, the selection of sentence are needed according to each sentence in the text important
Degree is screened, and is needed for this to each sentence SkCertain weight is assigned, is designated as w (Sk).Herein for sentence weight w
(Sk) calculating consider several factors:
(1) word frequency
In the writing of article, people often reuse for the vocabulary closely related with article theme.Therefore, exist
Under statistical significance, vocabulary frequently occurs, mean that it to article to a certain extent expressed by the related possibility of theme
It is higher.On the other hand, in article, the frequency that a word occurs is higher, and its significance level is also bigger, and place sentence also has more
Representativeness, but except some can not represent the word of article meaning, namely stop words.
(2) field keyword
Field keyword can be good at reacting the text subject of association area, so can be with to the sentence containing keyword
It is considered as sentence of making a summary.
(3) title
Title is the phrase for the prompting article content that author provides, and reflects the theme of article.Title is divided herein
After word, by stop words vocabulary (Stoplist), reject the stop words included in title, remaining word often with original text master
Topic has close relation.
(4) position
Position is a key character, and the P.E.Baxendale in U.S. investigation result is shown:The proposition of paragraph is paragraph
The probability of first sentence is 85%, and the probability for being paragraph end sentence is 7%.Therefore, for the first section of article, latter end, section head and section tail
The weight of sentence should be increased suitably.
(5) clue word
The probability that sentence where some special words is selected into summary is greater than other sentences, such asWe claim
This kind of word is summary sentence clue word.If contained in sentence " discuss herein () ", " herein
Propose () ", " all in all () " and " last () " etc. represent recapitulative word, then say
The bright sentence can summarize the meaning of article, it should suitably increase weight.
In summary the analysis of factor, the Weight function design for sentence is as follows herein:
Wherein:W(Sk) represent sentence SkWeights;
Wp(Sk) sentence position weight is represented, it is according to sentence position that weights setting is as follows:
wkiThe weighting of expression high frequency vocabulary distich, specific value are as follows:
lkRepresent sentence SkLength.In general, the high frequency words that longer sentence usually contains are more, therefore need basis
The number of high frequency words is normalized the length of sentence, so as to eliminate the influence of sentence length.Herein by the weight of high frequency words
The entry sum that sum divided by sentence are included, obtains the average entry weight of sentence.
wkjRepresent keyword to sentence SkWeighting, value set it is as follows:
wkmRepresent title in vocabulary to sentence SkWeighting, its value set it is as follows:
Wc(Sk) represent clue word to sentence SkWeighting, value set it is as follows;
Step S120, it is ranked up according to sentence weight, chooses the percentage of sentences in article sum as summary sentence;
Step S130, the sentence of extraction is resequenced according to order of the sentence in original text, sentence is spelled
Deliver a child into summary.
The summary of original text is generated using extracting method herein, extraction algorithm is as follows:
Input:Tibetan language text
Output:Text snippet
Process:
(1) sentence in text is extracted;
(2) too short or too long of sentence is filtered out;
(3) the novel degree between sentence is calculated, to filter out redundancy sentence;
(4) weight of sentence is calculated according to formula (3);
(5) text snippet is generated;
(6) summary of generation is exported.
Mainly consider following factor during the selection of summary sentence:
(1) long or too short sentence is filtered out.
In the summary of article, the less appearance of generally long or too short sentence, so for long too short sentence
Be not suitable for electing summary sentence as.The length of sentence is calculated in units of the word in Tibetan language herein, by statistics, chosen herein
Most short and extreme length threshold value is respectively 5 and 40.
(2) redundancy sentence is filtered.
In the creation of article, in order to highlight article centre point, people often repeatedly use can be anti-
The sentence of article centre point is reflected, these sentences are easy to be selected simultaneously as summary sentence, so as to cause the weight of clip Text
It is multiple.Therefore, the novel degree of sentence can be calculated in the selection course of summary sentence herein.Herein by between calculating sentence
Cosine similarity carry out judging sentence novelty degree.Sentence i is expressed as using vector space model SVM:Si(wi1,…,wik,…,
w1n),wikRepresent term weight function in sentence.Characteristic value is used as using the number that Feature Words occur in sentence herein.Sentence is similar
It is as follows to spend calculation formula:
Wherein, Sim (Si,Sj) represent sentence SiWith sentence SjBetween similarity.
(3) sentence weight
After being filtered to long in article or too short sentence, for remaining sentence, carried out by thesaurus
The calculating of sentence weight.
The weight size that sentence is first according to after sentence weight is calculated carries out the sequence of sentence, and sentence is chosen according to ranking
The 30% of sub- sum is as candidate's summary sentence.Sentence of being made a summary afterwards to candidate carries out sentence redundant computation, filters out redundancy sentence.Most
Afterwards, resequenced for the summary sentence of extraction according to order of the sentence in original text, sentence is stitched together as summary.
As shown in Fig. 2 a test sample is chosen from the Tibetan language corpus of acquisition herein carries out instance analysis.
Summary sentence choose before, i.e., to sentence carry out weight computing before, it is necessary to in text how long or too short sentence
Son is rejected, and has been filtered out as shown in Figure 2 by the threshold value (sentence length is less than 5 or sentence length is more than 40) of setting
Sentence (2), line renumbering is entered to remaining sentence.
By the word frequency statisticses and Keywords matching to text, we have obtained thesaurus, next right according to formula (1)
Sentence carries out weight computing, and table 2 lists the weights of text sentence.Wherein secondary series is to be carried out according to the position of sentence for sentence
Initial assignment;3rd row are to carry out the result after weight computing for sentence by formula (3);4th row are the weights according to sentence
Size sentence is ranked up after result.
The sentence weights of table 2
Fig. 3 is abstract extraction surface chart of the embodiment of the present invention.As shown in figure 3, be herein 30% by the ratio setting of summary,
By before the weight selection ranking of table 2 30% sentence, totally 12, text, the strategy rounded downwards is taken, that is, select (3) (13)
(9) (10) four, the summary after then being resequenced according to four positions in original text as this text.
By Fig. 2 and Fig. 3 to original text and summary contrast as can be seen that summary effect reached expected requirement, carry
The clip Text taken can reflect the main contents of original text substantially.
The evaluation of autoabstract quality is carried out by the way of being compared with artificial summary herein, is manually made a summary by Tibetan people
Member's manual extraction.In units of sentence, by calculating accuracy rate P, recall rate R and F value is weighed, and wherein F values are most important
Index.The calculation formula of these three indexs such as formula (3), (4), (5).
Wherein:P is accuracy rate;R is recall rate;
A:In summary at the same be marked as make a summary sentence sentence number
B:Not in summary but it is marked as the sentence number of summary sentence
C:In summary but it is not flagged as the sentence number of summary sentence
Tibetan language language material 20 is randomly selected from corpus herein, compares, leads to artificial summary after generating autoabstract
Cross the accuracy rate P that each piece article is calculated, recall rate R and F value.As shown in table 3
Table 3 P, R, F value
It is respectively 69.35%, 70.95%, 70.1% to finally give P, R, F average.From the point of view of F values, summary effect reaches
More satisfactory effect.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include
Within protection scope of the present invention.
Claims (8)
1. a kind of Tibetan language text summarization generation method based on Web, it is characterised in that comprise the following steps:
Go to match the sentence in article original text by thesaurus, and the weight of sentence is calculated;
It is ranked up according to sentence weight, chooses the percentage of sentences in article sum as summary sentence;
The sentence of extraction is resequenced according to order of the sentence in original text, sentence is subjected to splicing generation summary.
2. abstraction generating method according to claim 1, it is characterised in that the thesaurus is built by the following method:
By counting the word frequency of article, a number of high frequency vocabulary is selected to be added to candidate topics vocabulary;
The article after participle is matched by field antistop list, the keyword of matching is added to candidate topics vocabulary;
Part vocabulary is extracted from article title and is added to candidate topics vocabulary;
Thesaurus is finally extracted from above three parts vocabulary according to key phrases extraction algorithm.
3. autoabstract generation method as claimed in claim 1, it is characterised in that the sentence Weight function is designed asWherein W (Sk) represent sentence SkWeights, Wp(Sk)
Represent sentence position weight, wkiRepresent the weighting of high frequency vocabulary distich, wkjRepresent keyword to sentence SkWeighting, wkmTable
Show vocabulary in text header to sentence SkWeighting, Wc(Sk) represent clue word to sentence SkWeighting.
4. autoabstract generation method as claimed in claim 1, it is characterised in that to avoid clip Text from repeating, to sentence
Novel degree is calculated, and sentence similarity calculation formula is:Wherein,
Sim(Si,Sj) represent sentence SiWith sentence SjBetween similarity.
5. autoabstract generation method as claimed in claim 1, it is characterised in that be logical to the sentence in the article original text
The cutting for carrying out word is crossed, is obtained after removing nonsensical stop words;The stop words is by filtering out high-frequency vocabulary
It is determined that.
6. autoabstract generation method as claimed in claim 1, it is characterised in that described that the sentence of extraction exists according to sentence
Order in original text is resequenced, and sentence is carried out into splicing generation summary step includes:
By filtering the sentence of redundancy, extraction summary sentence is resequenced in original text order by sentence, has pieced together and be used as summary.
7. autoabstract generation method as claimed in claim 1, it is characterised in that influence the factor of the sentence weight calculation
Including:One or more of word frequency, field keyword, title, position and clue word.
8. autoabstract generation method as claimed in claim 1, it is characterised in that it is described to be ranked up according to sentence weight,
The percentage of sentences in article sum is chosen as summary sentence step, including:
It is ranked up according to sentence weight, chooses the 30% of sentences in article sum and be used as summary sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847326.1A CN107526841A (en) | 2017-09-19 | 2017-09-19 | A kind of Tibetan language text summarization generation method based on Web |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847326.1A CN107526841A (en) | 2017-09-19 | 2017-09-19 | A kind of Tibetan language text summarization generation method based on Web |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107526841A true CN107526841A (en) | 2017-12-29 |
Family
ID=60737091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710847326.1A Pending CN107526841A (en) | 2017-09-19 | 2017-09-19 | A kind of Tibetan language text summarization generation method based on Web |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107526841A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815328A (en) * | 2018-12-28 | 2019-05-28 | 东软集团股份有限公司 | A kind of abstraction generating method and device |
CN110489543A (en) * | 2019-08-14 | 2019-11-22 | 北京金堤科技有限公司 | A kind of extracting method and device of news in brief |
CN110781291A (en) * | 2019-10-25 | 2020-02-11 | 北京市计算中心 | Text abstract extraction method, device, server and readable storage medium |
CN111159393A (en) * | 2019-12-30 | 2020-05-15 | 电子科技大学 | Text generation method for abstracting abstract based on LDA and D2V |
CN111651588A (en) * | 2020-06-10 | 2020-09-11 | 扬州大学 | Article abstract information extraction algorithm based on directed graph |
CN111797225A (en) * | 2020-06-16 | 2020-10-20 | 北京北大软件工程股份有限公司 | Text abstract generation method and device |
CN112328946A (en) * | 2020-12-10 | 2021-02-05 | 青海民族大学 | Method and system for automatically generating Tibetan language webpage abstract |
CN114996444A (en) * | 2022-06-28 | 2022-09-02 | 中国人民解放军63768部队 | Automatic news summarization method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101393545A (en) * | 2008-11-06 | 2009-03-25 | 新百丽鞋业(深圳)有限公司 | Method for implementing automatic abstracting by utilizing association model |
CN106021226A (en) * | 2016-05-16 | 2016-10-12 | 中国建设银行股份有限公司 | Text abstract generation method and apparatus |
CN107133213A (en) * | 2017-05-06 | 2017-09-05 | 广东药科大学 | A kind of text snippet extraction method and system based on algorithm |
-
2017
- 2017-09-19 CN CN201710847326.1A patent/CN107526841A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101393545A (en) * | 2008-11-06 | 2009-03-25 | 新百丽鞋业(深圳)有限公司 | Method for implementing automatic abstracting by utilizing association model |
CN106021226A (en) * | 2016-05-16 | 2016-10-12 | 中国建设银行股份有限公司 | Text abstract generation method and apparatus |
CN107133213A (en) * | 2017-05-06 | 2017-09-05 | 广东药科大学 | A kind of text snippet extraction method and system based on algorithm |
Non-Patent Citations (1)
Title |
---|
南奎娘若: "基于敏感信息的藏文文本摘要提取的研究", 《网络安全》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815328A (en) * | 2018-12-28 | 2019-05-28 | 东软集团股份有限公司 | A kind of abstraction generating method and device |
CN109815328B (en) * | 2018-12-28 | 2021-05-25 | 东软集团股份有限公司 | Abstract generation method and device |
CN110489543A (en) * | 2019-08-14 | 2019-11-22 | 北京金堤科技有限公司 | A kind of extracting method and device of news in brief |
CN110781291A (en) * | 2019-10-25 | 2020-02-11 | 北京市计算中心 | Text abstract extraction method, device, server and readable storage medium |
CN111159393A (en) * | 2019-12-30 | 2020-05-15 | 电子科技大学 | Text generation method for abstracting abstract based on LDA and D2V |
CN111159393B (en) * | 2019-12-30 | 2023-10-10 | 电子科技大学 | Text generation method for abstract extraction based on LDA and D2V |
CN111651588A (en) * | 2020-06-10 | 2020-09-11 | 扬州大学 | Article abstract information extraction algorithm based on directed graph |
CN111651588B (en) * | 2020-06-10 | 2024-03-05 | 扬州大学 | Article abstract information extraction algorithm based on directed graph |
CN111797225A (en) * | 2020-06-16 | 2020-10-20 | 北京北大软件工程股份有限公司 | Text abstract generation method and device |
CN111797225B (en) * | 2020-06-16 | 2023-08-22 | 北京北大软件工程股份有限公司 | Text abstract generation method and device |
CN112328946A (en) * | 2020-12-10 | 2021-02-05 | 青海民族大学 | Method and system for automatically generating Tibetan language webpage abstract |
CN114996444A (en) * | 2022-06-28 | 2022-09-02 | 中国人民解放军63768部队 | Automatic news summarization method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107526841A (en) | A kind of Tibetan language text summarization generation method based on Web | |
CN102360383B (en) | Method for extracting text-oriented field term and term relationship | |
CN108763213A (en) | Theme feature text key word extracting method | |
Choi et al. | Domain-specific sentiment analysis using contextual feature generation | |
CN101520802A (en) | Question-answer pair quality evaluation method and system | |
CN109710947A (en) | Power specialty word stock generating method and device | |
El-Shishtawy et al. | Arabic keyphrase extraction using linguistic knowledge and machine learning techniques | |
Ayadi et al. | Latent topic model for indexing arabic documents | |
CN109062895A (en) | A kind of intelligent semantic processing method | |
CN111563372B (en) | Typesetting document content self-duplication checking method based on teaching book publishing | |
Chader et al. | Sentiment Analysis for Arabizi: Application to Algerian Dialect. | |
Alhanjouri | Pre processing techniques for Arabic documents clustering | |
Sembok et al. | Arabic word stemming algorithms and retrieval effectiveness | |
Akther et al. | Compilation, analysis and application of a comprehensive Bangla Corpus KUMono | |
Fodil et al. | Theme classification of Arabic text: A statistical approach | |
Al Taawab et al. | Transliterated bengali comment classification from social media | |
Ringlstetter et al. | Adaptive text correction with Web-crawled domain-dependent dictionaries | |
Yapinus et al. | Automatic multi-document summarization for Indonesian documents using hybrid abstractive-extractive summarization technique | |
Alam et al. | Bangla news trend observation using LDA based topic modeling | |
Heidary et al. | Automatic Persian text summarization using linguistic features from text structure analysis | |
Li-Juan et al. | A classification method of Vietnamese news events based on maximum entropy model | |
Liao et al. | Combining Language Model with Sentiment Analysis for Opinion Retrieval of Blog-Post. | |
Maheswari et al. | Rule based morphological variation removable stemming algorithm | |
Souza et al. | Extraction of keywords from texts: an exploratory study using Noun Phrases | |
CN111209737B (en) | Method for screening out noise document and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171229 |
|
RJ01 | Rejection of invention patent application after publication |