CN110362674B - Microblog news abstract extraction type generation method based on convolutional neural network - Google Patents
Microblog news abstract extraction type generation method based on convolutional neural network Download PDFInfo
- Publication number
- CN110362674B CN110362674B CN201910650915.XA CN201910650915A CN110362674B CN 110362674 B CN110362674 B CN 110362674B CN 201910650915 A CN201910650915 A CN 201910650915A CN 110362674 B CN110362674 B CN 110362674B
- Authority
- CN
- China
- Prior art keywords
- abstract
- data set
- text
- content
- news
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a microblog news abstract extraction type generation method based on a convolutional neural network, which relates to the field of natural language processing and comprises the following steps of: capturing the content of the microblog website as an initial news data set Q by using a data acquisition module; processing the news data set Q to obtain a data set Q'; constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S; and further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain the extracted abstract text summary. The method can facilitate news workers and the like to further and rapidly analyze and retrieve the generated abstract contents, removes semantic repeated contents by adopting a text similarity algorithm, and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents to obtain more comprehensive and accurate content abstract.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a microblog news abstract extraction type generation method based on a convolutional neural network.
Background
Automatic text generation is an important research direction in the field of natural language processing. The text automatic generation technology also has wide application prospect, and can be applied to man-machine interaction operations such as intelligent question answering and machine translation; in addition, the automatic text generation system can also be used for realizing automatic writing of news manuscripts, library retrieval and the like.
In the fields of natural language processing and artificial intelligence, automatic text generation technology has had several influential achievements and applications, for example, the united states union has used news writing software to automatically write news manuscripts to report company performance since 2014 and 7, which greatly reduces the workload of journalists.
The key technology in the text automatic generation technology is text abstract generation, a given document or a document set is automatically analyzed, the key point information in the document or the document set is extracted, and finally a short abstract is output. The current text summarization method is mainly divided into two methods: generating and extracting. The extraction formula is mainly based on sentence extraction, that is, the sentences in the original text are used as units for evaluation and extraction. The second is a generative method, which generally needs to perform syntactic and semantic analysis on a text by using a natural language understanding technology, fuse information, and generate a new abstract sentence by using the natural language generating technology.
In the prior art documents, the abstract generation system based on the deep neural network proposed in patent CN201610232659.9 and the abstract generation system based on the deep learning and attention mechanism proposed in patent CN 201811416029.2 are both of the generation formulas. The generated abstract comprises partial keywords, so that the correct word sequence cannot be formed and the performance of the generated abstract is satisfactory.
Disclosure of Invention
The invention aims to provide a microblog news abstract extraction type generation method based on a convolutional neural network, so that the problems in the prior art are solved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a microblog news abstract extraction type generation method based on a convolutional neural network comprises the following steps:
s1, capturing microblog website contents as an initial news data set Q by using a data acquisition module;
s2, processing the news data set Q to obtain a data set Q';
s3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S;
and S4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain the abstracted abstract text summary after extraction.
Preferably, the processing mode of the news data set Q in step S2 is filtering, similar merging and deduplication, and specifically includes:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp;
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrixt is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MAnd M is the total number of samples.
Preferably, step S3 specifically includes:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data setObtaining a news data set feature matrix by using the feature vector of the text single sentence
S33, constructing a convolutional neural network, and recording the convolutional neural network as TextCNN, wherein the TextCNN network structure comprises a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
s34, characterizing the model data setRandomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
and S36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S.
Preferably, step S32 specifically includes:
1) extracting the model datasetText single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1,
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table asn is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m:
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
4) normalizing the feature matrix F' obtained in the above steps according to rows to obtain a normalized feature matrix
5) Traverse a model datasetRepeating the steps (1) to (4) to obtain the characteristics of the model data setliAs a model data setAnd K is the sum of the single sentences of the model data set.
Preferably, step S4 specifically includes:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
S42, filtering out cosine similarity value in the abstract content STo obtain the abstract content without duplication
S43, using the maximum edge correlation model to abstract the contentAnd processing to obtain the extracted abstract text.
Preferably, step S43 specifically includes:
(1) traverse summary contentObtaining a candidate abstract text s by adopting a formula in the text single sentence:
(2) adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integerTotal number of sentences in.
Preferably, the formula adopted in step (1) is:
wherein, the value of the lambda is 0.9,representing summary contentSentence i and the whole abstract contentCosine similarity of (d);expressed as summary contentAnd setting the cosine similarity of the ith sentence and the sum of the candidate abstract sets to null.
Preferably, the data collection module in step S1 is a real-time crawler module.
The invention has the beneficial effects that:
the microblog news abstract extraction type generation method based on the convolutional neural network has the following advantages:
1. according to the microblog news abstract extracting type generating method based on the convolutional neural network, the microblog news content is extracted in an abstract mode, the abstract sentences have better readability, and news workers and the like can further rapidly analyze and retrieve the generated abstract content conveniently.
2. The abstract extraction method adopts TF-IDF weighted Word2Vec Word vectors, further utilizes a convolutional neural network to comprehensively consider various characteristics of sentences to classify the importance of the sentences, finishes extraction of contents containing six elements of news, including six elements of time, place, event description, passage, cause, result and the like, and further finishes abstract generation.
3. The invention adopts a text similarity algorithm to remove semantic repeated contents and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents so as to obtain more comprehensive and accurate content abstract.
Drawings
FIG. 1 is a flowchart of an abstract abstraction type generation method in embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a convolutional neural network in embodiment 1 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The embodiment provides a convolutional neural network-based microblog news digest extraction type generation method, as shown in fig. 1, which includes the following steps:
s1, capturing microblog website contents as an initial news data set by using a real-time crawler module, and recording the initial news data set as a news data set Q ═ Q { (Q) } Q }1,q2,...,qNWherein q isiThe method comprises the steps of obtaining an ith sample in a news data set, wherein i is 1,2, and N is the total number of samples in the news data set;
s2, filtering the news data set Q, merging the same type and removing duplication to obtain a data set Q', and the specific steps are as follows:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp;
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrixt is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MAnd M is the total number of samples.
S3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S, and the specific steps are as follows:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data setObtaining a news data set feature matrix by using the feature vector of the text single sentence
1) Extracting the model datasetText single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1,
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table asn is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m:
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
4) normalizing the feature matrix F' obtained in the above steps according to rows to obtain a normalized feature matrix
5) Traverse a model datasetRepeating the steps (1) to (4) to obtain the characteristics of the model data setliAs a model data setAnd K is the sum of the single sentences of the model data set.
S33, constructing a convolutional neural network, as shown in FIG. 2, which is marked as TextCNN, wherein the TextCNN network structure is a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
in the convolutional layer in the embodiment, the number of convolutional kernels is 256, the size of the convolutional kernels is 5, the activation function is the Relu function, the number of neurons in the full-link layer is 128, the learning rate is 0.001, and the random inactivation rate is 0.5;
s34, characterizing the model data setRandomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
and S36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S.
S4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain an extracted abstract text summary, wherein the step S4 specifically comprises the following steps:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
S42, filtering out cosine similarity value in the abstract content STo obtain the abstract content without duplication
S43, using the maximum edge correlation model to the abstract content obtained in the above stepsAnd processing to obtain the extracted abstract text.
Step S43 specifically includes:
(1) traverse summary contentObtaining a candidate abstract text s by adopting the following formula for the text single sentence in the text;
wherein, the value of the lambda is 0.9,representing summary contentSentence i and the whole abstract contentCosine similarity of (d);expressed as summary contentAnd setting the cosine similarity of the ith sentence and the sum of the candidate abstract sets to null.
(2) Adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integerTotal number of sentences in.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
1. according to the microblog news abstract extracting type generating method based on the convolutional neural network, the microblog news content is extracted in an abstract mode, the abstract sentences have better readability, and news workers and the like can further rapidly analyze and retrieve the generated abstract content conveniently.
2. The abstract extraction method adopts TF-IDF weighted Word2Vec Word vectors, further utilizes a convolutional neural network to comprehensively consider various characteristics of sentences to classify the importance of the sentences, finishes the extraction of contents containing six elements of news, including six elements of time, place, event description, passage, cause and result, and further finishes abstract generation.
3. The invention adopts a text similarity algorithm to remove semantic repeated contents and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents so as to obtain more comprehensive and accurate content abstract.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.
Claims (2)
1. A microblog news abstract extraction type generation method based on a convolutional neural network is characterized by comprising the following steps:
s1, capturing microblog website contents as an initial news data set Q by using a data acquisition module;
s2, processing the news data set Q to obtain a data set Q';
s3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S;
s4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain an extracted abstract text summary;
in step S2, the processing mode of the news data set Q is filtering, similar merging and duplicate removal, and specifically includes:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp;
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrixt is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MM is the total number of samples;
step S3 specifically includes:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data setObtaining a news data set feature matrix by using the feature vector of the text single sentence
S33, constructing a convolutional neural network, and recording the convolutional neural network as TextCNN, wherein the TextCNN network structure comprises a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
s34, characterizing the model data setRandomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
s36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S;
step S32 specifically includes:
1) extracting the model datasetText single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1,
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table asn is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m:
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
4) to the above stepsNormalizing the obtained feature matrix F' according to rows to obtain a normalized feature matrix
5) Traverse a model datasetRepeating the steps 1) to 4) to obtain the characteristics of the model data setliAs a model data setThe ith label is the sum of the single sentences of the model data set;
step S4 specifically includes:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
S42, filtering out cosine similarity value in the abstract content STo obtain the abstract content without duplication
S43, using the maximum edge correlation model to abstract the contentProcessing to obtain an extracted abstract text;
step S43 specifically includes:
(1) traverse summary contentObtaining a candidate abstract text s by adopting a formula in the text single sentence:
(2) adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integerTotal number of Chinese sentences;
the formula adopted in step (1) is as follows:
2. The extraction-type generation method of microblog news digests based on the convolutional neural network as claimed in claim 1, wherein the data acquisition module in the step S1 is a real-time crawler module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650915.XA CN110362674B (en) | 2019-07-18 | 2019-07-18 | Microblog news abstract extraction type generation method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650915.XA CN110362674B (en) | 2019-07-18 | 2019-07-18 | Microblog news abstract extraction type generation method based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110362674A CN110362674A (en) | 2019-10-22 |
CN110362674B true CN110362674B (en) | 2020-08-04 |
Family
ID=68221249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910650915.XA Active CN110362674B (en) | 2019-07-18 | 2019-07-18 | Microblog news abstract extraction type generation method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110362674B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110933518B (en) * | 2019-12-11 | 2020-10-02 | 浙江大学 | Method for generating query-oriented video abstract by using convolutional multi-layer attention network mechanism |
CN111191413B (en) * | 2019-12-30 | 2021-11-12 | 北京航空航天大学 | Method, device and system for automatically marking event core content based on graph sequencing model |
CN111274776B (en) * | 2020-01-21 | 2020-12-15 | 中国搜索信息科技股份有限公司 | Article generation method based on keywords |
CN111507090A (en) * | 2020-02-27 | 2020-08-07 | 平安科技(深圳)有限公司 | Abstract extraction method, device, equipment and computer readable storage medium |
CN111639176B (en) * | 2020-05-29 | 2022-07-01 | 厦门大学 | Real-time event summarization method based on consistency monitoring |
CN111859887A (en) * | 2020-07-21 | 2020-10-30 | 北京北斗天巡科技有限公司 | Scientific and technological news automatic writing system based on deep learning |
TR202022040A1 (en) * | 2020-12-28 | 2022-07-21 | Sestek Ses Ve Iletisim Bilgisayar Tek San Tic A S | A METHOD OF MEASURING TEXT SUMMARY SUCCESS THAT IS SENSITIVE TO SUBJECT CLASSIFICATION AND A SUMMARY SYSTEM USING THIS METHOD |
CN112883716B (en) * | 2021-02-03 | 2022-05-03 | 重庆邮电大学 | Twitter abstract generation method based on topic correlation |
CN112906382B (en) * | 2021-02-05 | 2022-06-21 | 山东省计算中心(国家超级计算济南中心) | Policy text multi-label labeling method and system based on graph neural network |
CN112989031B (en) * | 2021-04-28 | 2021-08-03 | 成都索贝视频云计算有限公司 | Broadcast television news event element extraction method based on deep learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834735B (en) * | 2015-05-18 | 2018-01-23 | 大连理工大学 | A kind of documentation summary extraction method based on term vector |
CN106055658A (en) * | 2016-06-02 | 2016-10-26 | 中国人民解放军国防科学技术大学 | Extraction method aiming at Twitter text event |
US10706349B2 (en) * | 2017-05-25 | 2020-07-07 | Texas Instruments Incorporated | Secure convolutional neural networks (CNN) accelerator |
CN109977219B (en) * | 2019-03-19 | 2021-04-09 | 国家计算机网络与信息安全管理中心 | Text abstract automatic generation method and device based on heuristic rule |
-
2019
- 2019-07-18 CN CN201910650915.XA patent/CN110362674B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110362674A (en) | 2019-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110362674B (en) | Microblog news abstract extraction type generation method based on convolutional neural network | |
CN110413986B (en) | Text clustering multi-document automatic summarization method and system for improving word vector model | |
CN111914558B (en) | Course knowledge relation extraction method and system based on sentence bag attention remote supervision | |
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
CN110825877A (en) | Semantic similarity analysis method based on text clustering | |
Bisandu et al. | Clustering news articles using efficient similarity measure and N-grams | |
CN113569050B (en) | Method and device for automatically constructing government affair field knowledge map based on deep learning | |
CN108268875B (en) | Image semantic automatic labeling method and device based on data smoothing | |
CN107480200A (en) | Word mask method, device, server and the storage medium of word-based label | |
CN112667940B (en) | Webpage text extraction method based on deep learning | |
CN115718792A (en) | Sensitive information extraction method based on natural semantic processing and deep learning | |
CN112131453A (en) | Method, device and storage medium for detecting network bad short text based on BERT | |
CN111597423B (en) | Performance evaluation method and device of interpretable method of text classification model | |
CN114492425B (en) | Method for communicating multi-dimensional data by adopting one set of field label system | |
CN112685549B (en) | Document-related news element entity identification method and system integrating discourse semantics | |
Thilagavathi et al. | Document clustering in forensic investigation by hybrid approach | |
CN115017404A (en) | Target news topic abstracting method based on compressed space sentence selection | |
CN110019814B (en) | News information aggregation method based on data mining and deep learning | |
Jadhav et al. | Unstructured big data information extraction techniques survey: Privacy preservation perspective | |
CN113326371A (en) | Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information | |
Zeng et al. | Fake news detection by using common latent semantics matching method | |
CN112765940A (en) | Novel webpage duplicate removal method based on subject characteristics and content semantics | |
Souvannavong et al. | Latent semantic indexing for semantic content detection of video shots | |
Labanan et al. | A Study on the Usability of Text Analysis on Web Artifacts for Digital Forensic Investigation | |
Nakanishi | Semantic Waveform Model for Similarity Measure by Time-series Variation in Meaning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |