CN110362674B - Microblog news abstract extraction type generation method based on convolutional neural network - Google Patents

Microblog news abstract extraction type generation method based on convolutional neural network Download PDF

Info

Publication number
CN110362674B
CN110362674B CN201910650915.XA CN201910650915A CN110362674B CN 110362674 B CN110362674 B CN 110362674B CN 201910650915 A CN201910650915 A CN 201910650915A CN 110362674 B CN110362674 B CN 110362674B
Authority
CN
China
Prior art keywords
abstract
data set
text
content
news
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910650915.XA
Other languages
Chinese (zh)
Other versions
CN110362674A (en
Inventor
滕辉
刘肖萌
龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinaso Information Technology Co ltd
Original Assignee
Chinaso Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinaso Information Technology Co ltd filed Critical Chinaso Information Technology Co ltd
Priority to CN201910650915.XA priority Critical patent/CN110362674B/en
Publication of CN110362674A publication Critical patent/CN110362674A/en
Application granted granted Critical
Publication of CN110362674B publication Critical patent/CN110362674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a microblog news abstract extraction type generation method based on a convolutional neural network, which relates to the field of natural language processing and comprises the following steps of: capturing the content of the microblog website as an initial news data set Q by using a data acquisition module; processing the news data set Q to obtain a data set Q'; constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S; and further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain the extracted abstract text summary. The method can facilitate news workers and the like to further and rapidly analyze and retrieve the generated abstract contents, removes semantic repeated contents by adopting a text similarity algorithm, and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents to obtain more comprehensive and accurate content abstract.

Description

Microblog news abstract extraction type generation method based on convolutional neural network
Technical Field
The invention relates to the field of natural language processing, in particular to a microblog news abstract extraction type generation method based on a convolutional neural network.
Background
Automatic text generation is an important research direction in the field of natural language processing. The text automatic generation technology also has wide application prospect, and can be applied to man-machine interaction operations such as intelligent question answering and machine translation; in addition, the automatic text generation system can also be used for realizing automatic writing of news manuscripts, library retrieval and the like.
In the fields of natural language processing and artificial intelligence, automatic text generation technology has had several influential achievements and applications, for example, the united states union has used news writing software to automatically write news manuscripts to report company performance since 2014 and 7, which greatly reduces the workload of journalists.
The key technology in the text automatic generation technology is text abstract generation, a given document or a document set is automatically analyzed, the key point information in the document or the document set is extracted, and finally a short abstract is output. The current text summarization method is mainly divided into two methods: generating and extracting. The extraction formula is mainly based on sentence extraction, that is, the sentences in the original text are used as units for evaluation and extraction. The second is a generative method, which generally needs to perform syntactic and semantic analysis on a text by using a natural language understanding technology, fuse information, and generate a new abstract sentence by using the natural language generating technology.
In the prior art documents, the abstract generation system based on the deep neural network proposed in patent CN201610232659.9 and the abstract generation system based on the deep learning and attention mechanism proposed in patent CN 201811416029.2 are both of the generation formulas. The generated abstract comprises partial keywords, so that the correct word sequence cannot be formed and the performance of the generated abstract is satisfactory.
Disclosure of Invention
The invention aims to provide a microblog news abstract extraction type generation method based on a convolutional neural network, so that the problems in the prior art are solved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a microblog news abstract extraction type generation method based on a convolutional neural network comprises the following steps:
s1, capturing microblog website contents as an initial news data set Q by using a data acquisition module;
s2, processing the news data set Q to obtain a data set Q';
s3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S;
and S4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain the abstracted abstract text summary after extraction.
Preferably, the processing mode of the news data set Q in step S2 is filtering, similar merging and deduplication, and specifically includes:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrix
Figure BDA0002135177680000021
t is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22
Figure BDA0002135177680000022
News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MAnd M is the total number of samples.
Preferably, step S3 specifically includes:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Figure BDA0002135177680000023
Figure BDA0002135177680000031
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data set
Figure BDA0002135177680000032
Obtaining a news data set feature matrix by using the feature vector of the text single sentence
Figure BDA0002135177680000033
S33, constructing a convolutional neural network, and recording the convolutional neural network as TextCNN, wherein the TextCNN network structure comprises a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
s34, characterizing the model data set
Figure BDA0002135177680000034
Randomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
and S36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S.
Preferably, step S32 specifically includes:
1) extracting the model dataset
Figure BDA0002135177680000035
Text single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1
Figure BDA0002135177680000036
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table as
Figure BDA0002135177680000041
n is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m
Figure BDA0002135177680000042
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
Figure BDA0002135177680000043
4) normalizing the feature matrix F' obtained in the above steps according to rows to obtain a normalized feature matrix
Figure BDA0002135177680000044
5) Traverse a model dataset
Figure BDA0002135177680000045
Repeating the steps (1) to (4) to obtain the characteristics of the model data set
Figure BDA0002135177680000046
liAs a model data set
Figure BDA0002135177680000047
And K is the sum of the single sentences of the model data set.
Preferably, step S4 specifically includes:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
Figure BDA0002135177680000048
S42, filtering out cosine similarity value in the abstract content S
Figure BDA0002135177680000049
To obtain the abstract content without duplication
Figure BDA00021351776800000410
S43, using the maximum edge correlation model to abstract the content
Figure BDA0002135177680000051
And processing to obtain the extracted abstract text.
Preferably, step S43 specifically includes:
(1) traverse summary content
Figure BDA0002135177680000052
Obtaining a candidate abstract text s by adopting a formula in the text single sentence:
(2) adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integer
Figure BDA0002135177680000053
Total number of sentences in.
Preferably, the formula adopted in step (1) is:
Figure BDA0002135177680000054
wherein, the value of the lambda is 0.9,
Figure BDA0002135177680000055
representing summary content
Figure BDA0002135177680000056
Sentence i and the whole abstract content
Figure BDA0002135177680000057
Cosine similarity of (d);
Figure BDA0002135177680000058
expressed as summary content
Figure BDA0002135177680000059
And setting the cosine similarity of the ith sentence and the sum of the candidate abstract sets to null.
Preferably, the data collection module in step S1 is a real-time crawler module.
The invention has the beneficial effects that:
the microblog news abstract extraction type generation method based on the convolutional neural network has the following advantages:
1. according to the microblog news abstract extracting type generating method based on the convolutional neural network, the microblog news content is extracted in an abstract mode, the abstract sentences have better readability, and news workers and the like can further rapidly analyze and retrieve the generated abstract content conveniently.
2. The abstract extraction method adopts TF-IDF weighted Word2Vec Word vectors, further utilizes a convolutional neural network to comprehensively consider various characteristics of sentences to classify the importance of the sentences, finishes extraction of contents containing six elements of news, including six elements of time, place, event description, passage, cause, result and the like, and further finishes abstract generation.
3. The invention adopts a text similarity algorithm to remove semantic repeated contents and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents so as to obtain more comprehensive and accurate content abstract.
Drawings
FIG. 1 is a flowchart of an abstract abstraction type generation method in embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a convolutional neural network in embodiment 1 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The embodiment provides a convolutional neural network-based microblog news digest extraction type generation method, as shown in fig. 1, which includes the following steps:
s1, capturing microblog website contents as an initial news data set by using a real-time crawler module, and recording the initial news data set as a news data set Q ═ Q { (Q) } Q }1,q2,...,qNWherein q isiThe method comprises the steps of obtaining an ith sample in a news data set, wherein i is 1,2, and N is the total number of samples in the news data set;
s2, filtering the news data set Q, merging the same type and removing duplication to obtain a data set Q', and the specific steps are as follows:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrix
Figure BDA0002135177680000061
t is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22
Figure BDA0002135177680000071
News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MAnd M is the total number of samples.
S3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S, and the specific steps are as follows:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Figure BDA0002135177680000072
Figure BDA0002135177680000073
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data set
Figure BDA0002135177680000074
Obtaining a news data set feature matrix by using the feature vector of the text single sentence
Figure BDA0002135177680000075
1) Extracting the model dataset
Figure BDA0002135177680000076
Text single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1
Figure BDA0002135177680000077
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table as
Figure BDA0002135177680000078
n is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m
Figure BDA0002135177680000081
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
Figure BDA0002135177680000082
4) normalizing the feature matrix F' obtained in the above steps according to rows to obtain a normalized feature matrix
Figure BDA0002135177680000083
5) Traverse a model dataset
Figure BDA0002135177680000084
Repeating the steps (1) to (4) to obtain the characteristics of the model data set
Figure BDA0002135177680000085
liAs a model data set
Figure BDA0002135177680000086
And K is the sum of the single sentences of the model data set.
S33, constructing a convolutional neural network, as shown in FIG. 2, which is marked as TextCNN, wherein the TextCNN network structure is a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
in the convolutional layer in the embodiment, the number of convolutional kernels is 256, the size of the convolutional kernels is 5, the activation function is the Relu function, the number of neurons in the full-link layer is 128, the learning rate is 0.001, and the random inactivation rate is 0.5;
s34, characterizing the model data set
Figure BDA0002135177680000087
Randomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
and S36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S.
S4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain an extracted abstract text summary, wherein the step S4 specifically comprises the following steps:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
Figure BDA0002135177680000091
S42, filtering out cosine similarity value in the abstract content S
Figure BDA0002135177680000092
To obtain the abstract content without duplication
Figure BDA0002135177680000093
S43, using the maximum edge correlation model to the abstract content obtained in the above steps
Figure BDA0002135177680000094
And processing to obtain the extracted abstract text.
Step S43 specifically includes:
(1) traverse summary content
Figure BDA0002135177680000095
Obtaining a candidate abstract text s by adopting the following formula for the text single sentence in the text;
Figure BDA0002135177680000096
wherein, the value of the lambda is 0.9,
Figure BDA0002135177680000097
representing summary content
Figure BDA0002135177680000098
Sentence i and the whole abstract content
Figure BDA0002135177680000099
Cosine similarity of (d);
Figure BDA00021351776800000910
expressed as summary content
Figure BDA00021351776800000911
And setting the cosine similarity of the ith sentence and the sum of the candidate abstract sets to null.
(2) Adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integer
Figure BDA00021351776800000912
Total number of sentences in.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
1. according to the microblog news abstract extracting type generating method based on the convolutional neural network, the microblog news content is extracted in an abstract mode, the abstract sentences have better readability, and news workers and the like can further rapidly analyze and retrieve the generated abstract content conveniently.
2. The abstract extraction method adopts TF-IDF weighted Word2Vec Word vectors, further utilizes a convolutional neural network to comprehensively consider various characteristics of sentences to classify the importance of the sentences, finishes the extraction of contents containing six elements of news, including six elements of time, place, event description, passage, cause and result, and further finishes abstract generation.
3. The invention adopts a text similarity algorithm to remove semantic repeated contents and adopts a maximum edge correlation model to balance the correlation and diversity of the extracted contents so as to obtain more comprehensive and accurate content abstract.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (2)

1. A microblog news abstract extraction type generation method based on a convolutional neural network is characterized by comprising the following steps:
s1, capturing microblog website contents as an initial news data set Q by using a data acquisition module;
s2, processing the news data set Q to obtain a data set Q';
s3, constructing a convolutional neural network to extract event elements from the processed news data set Q' to obtain abstract content S;
s4, further processing the abstract content S by using a text similarity algorithm and a maximum edge correlation model to obtain an extracted abstract text summary;
in step S2, the processing mode of the news data set Q is filtering, similar merging and duplicate removal, and specifically includes:
s21, traversing all samples of the news data set Q, removing pictures, videos and html labels to obtain the news data set Qtmp
S22, traverse the news data set Q in the step S21tmpAll samples of (1), the time and place of the extracted sample, and recorded as a time and place mark matrix
Figure FDA0002509529480000011
t is the time value, loc is the place value, i 1,2tmpTotal number of samples;
s23, traversing the label matrix obtained in the step S22
Figure FDA0002509529480000012
News data set QtmpThe corresponding samples with the same label vector are merged to obtain a news data set Q '═ { Q'1,q'2,...,q'MM is the total number of samples;
step S3 specifically includes:
s31, traversing all samples of the news data set Q', performing single sentence segmentation and manual labeling on the samples to obtain a model data set
Figure FDA0002509529480000013
Figure FDA0002509529480000014
Wherein ljText single sentence c after segmenting for samplejLabel of lj∈ { time, place, event description, cause, pass, result }, j 1, 2., K is the total number of single sentences of the model dataset;
s32, extracting a model data set
Figure FDA0002509529480000021
Obtaining a news data set feature matrix by using the feature vector of the text single sentence
Figure FDA0002509529480000022
S33, constructing a convolutional neural network, and recording the convolutional neural network as TextCNN, wherein the TextCNN network structure comprises a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;
s34, characterizing the model data set
Figure FDA0002509529480000023
Randomly dividing the training set, the testing set and the verification set according to the ratio of 4:2: 1;
s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;
s36, abstracting the test set in the step S34 by using the Model obtained in the step S35 to obtain a text single sentence set which only comprises time, place, event description, passage, cause and result and is marked as abstract content S;
step S32 specifically includes:
1) extracting the model dataset
Figure FDA0002509529480000024
Text single sentence c in (1)1Obtaining a weight matrix according to the TF-IDF characteristics1
Figure FDA0002509529480000025
Wherein the content of the first and second substances,ias a single sentence of text c1The TF-IDF characteristic value is corresponding to the vocabulary table as
Figure FDA0002509529480000026
n is a text single sentence c1The total number of words of (c);
2) extracting Word2Vec characteristics of the vocabulary V to obtain a text single sentence c1Feature matrix Fn×m
Figure FDA0002509529480000031
Wherein f isiAs a vocabulary table V1The Word2Vec eigenvector of the ith Word, wherein m is the dimension of the eigenvector, and the value of m is 300;
3) using the weight matrix obtained in step 1)1And the feature matrix F obtained in step 2)n×mTo obtain a text single sentence c1A feature matrix F':
Figure FDA0002509529480000032
4) to the above stepsNormalizing the obtained feature matrix F' according to rows to obtain a normalized feature matrix
Figure FDA0002509529480000033
5) Traverse a model dataset
Figure FDA0002509529480000034
Repeating the steps 1) to 4) to obtain the characteristics of the model data set
Figure FDA0002509529480000035
liAs a model data set
Figure FDA0002509529480000036
The ith label is the sum of the single sentences of the model data set;
step S4 specifically includes:
s41, traversing all text single sentences in the abstract content S, and calculating cosine similarity values between the text single sentences
Figure FDA0002509529480000037
S42, filtering out cosine similarity value in the abstract content S
Figure FDA0002509529480000038
To obtain the abstract content without duplication
Figure FDA0002509529480000039
S43, using the maximum edge correlation model to abstract the content
Figure FDA00025095294800000310
Processing to obtain an extracted abstract text;
step S43 specifically includes:
(1) traverse summary content
Figure FDA00025095294800000311
Obtaining a candidate abstract text s by adopting a formula in the text single sentence:
(2) adding the candidate abstract text s obtained in the step into a candidate abstract set summary;
(3) repeating the steps (1) to (2) for C times to obtain a candidate abstract set summary, namely the extracted abstract text, wherein C is a positive integer and is a positive integer
Figure FDA0002509529480000041
Total number of Chinese sentences;
the formula adopted in step (1) is as follows:
Figure FDA0002509529480000042
wherein, the value of the lambda is 0.9,
Figure FDA0002509529480000043
representing summary content
Figure FDA0002509529480000044
Sentence i and the whole abstract content
Figure FDA0002509529480000045
Cosine similarity of (d);
Figure FDA0002509529480000046
expressed as summary content
Figure FDA0002509529480000047
And setting the cosine similarity of the ith sentence and the sum of the candidate abstract sets to null.
2. The extraction-type generation method of microblog news digests based on the convolutional neural network as claimed in claim 1, wherein the data acquisition module in the step S1 is a real-time crawler module.
CN201910650915.XA 2019-07-18 2019-07-18 Microblog news abstract extraction type generation method based on convolutional neural network Active CN110362674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910650915.XA CN110362674B (en) 2019-07-18 2019-07-18 Microblog news abstract extraction type generation method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910650915.XA CN110362674B (en) 2019-07-18 2019-07-18 Microblog news abstract extraction type generation method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN110362674A CN110362674A (en) 2019-10-22
CN110362674B true CN110362674B (en) 2020-08-04

Family

ID=68221249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910650915.XA Active CN110362674B (en) 2019-07-18 2019-07-18 Microblog news abstract extraction type generation method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN110362674B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110933518B (en) * 2019-12-11 2020-10-02 浙江大学 Method for generating query-oriented video abstract by using convolutional multi-layer attention network mechanism
CN111191413B (en) * 2019-12-30 2021-11-12 北京航空航天大学 Method, device and system for automatically marking event core content based on graph sequencing model
CN111274776B (en) * 2020-01-21 2020-12-15 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111507090A (en) * 2020-02-27 2020-08-07 平安科技(深圳)有限公司 Abstract extraction method, device, equipment and computer readable storage medium
CN111639176B (en) * 2020-05-29 2022-07-01 厦门大学 Real-time event summarization method based on consistency monitoring
CN111859887A (en) * 2020-07-21 2020-10-30 北京北斗天巡科技有限公司 Scientific and technological news automatic writing system based on deep learning
TR202022040A1 (en) * 2020-12-28 2022-07-21 Sestek Ses Ve Iletisim Bilgisayar Tek San Tic A S A METHOD OF MEASURING TEXT SUMMARY SUCCESS THAT IS SENSITIVE TO SUBJECT CLASSIFICATION AND A SUMMARY SYSTEM USING THIS METHOD
CN112883716B (en) * 2021-02-03 2022-05-03 重庆邮电大学 Twitter abstract generation method based on topic correlation
CN112906382B (en) * 2021-02-05 2022-06-21 山东省计算中心(国家超级计算济南中心) Policy text multi-label labeling method and system based on graph neural network
CN112989031B (en) * 2021-04-28 2021-08-03 成都索贝视频云计算有限公司 Broadcast television news event element extraction method based on deep learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834735B (en) * 2015-05-18 2018-01-23 大连理工大学 A kind of documentation summary extraction method based on term vector
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
US10706349B2 (en) * 2017-05-25 2020-07-07 Texas Instruments Incorporated Secure convolutional neural networks (CNN) accelerator
CN109977219B (en) * 2019-03-19 2021-04-09 国家计算机网络与信息安全管理中心 Text abstract automatic generation method and device based on heuristic rule

Also Published As

Publication number Publication date
CN110362674A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110362674B (en) Microblog news abstract extraction type generation method based on convolutional neural network
CN110413986B (en) Text clustering multi-document automatic summarization method and system for improving word vector model
CN111914558B (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN110825877A (en) Semantic similarity analysis method based on text clustering
Bisandu et al. Clustering news articles using efficient similarity measure and N-grams
CN113569050B (en) Method and device for automatically constructing government affair field knowledge map based on deep learning
CN108268875B (en) Image semantic automatic labeling method and device based on data smoothing
CN107480200A (en) Word mask method, device, server and the storage medium of word-based label
CN112667940B (en) Webpage text extraction method based on deep learning
CN115718792A (en) Sensitive information extraction method based on natural semantic processing and deep learning
CN112131453A (en) Method, device and storage medium for detecting network bad short text based on BERT
CN111597423B (en) Performance evaluation method and device of interpretable method of text classification model
CN114492425B (en) Method for communicating multi-dimensional data by adopting one set of field label system
CN112685549B (en) Document-related news element entity identification method and system integrating discourse semantics
Thilagavathi et al. Document clustering in forensic investigation by hybrid approach
CN115017404A (en) Target news topic abstracting method based on compressed space sentence selection
CN110019814B (en) News information aggregation method based on data mining and deep learning
Jadhav et al. Unstructured big data information extraction techniques survey: Privacy preservation perspective
CN113326371A (en) Event extraction method fusing pre-training language model and anti-noise interference remote monitoring information
Zeng et al. Fake news detection by using common latent semantics matching method
CN112765940A (en) Novel webpage duplicate removal method based on subject characteristics and content semantics
Souvannavong et al. Latent semantic indexing for semantic content detection of video shots
Labanan et al. A Study on the Usability of Text Analysis on Web Artifacts for Digital Forensic Investigation
Nakanishi Semantic Waveform Model for Similarity Measure by Time-series Variation in Meaning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant