CN111597328A - New event theme extraction method - Google Patents
New event theme extraction method Download PDFInfo
- Publication number
- CN111597328A CN111597328A CN202010541567.5A CN202010541567A CN111597328A CN 111597328 A CN111597328 A CN 111597328A CN 202010541567 A CN202010541567 A CN 202010541567A CN 111597328 A CN111597328 A CN 111597328A
- Authority
- CN
- China
- Prior art keywords
- event
- news
- text
- text data
- new event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention belongs to the technical field of network information, and relates to a new event theme extraction method, wherein a news event text data set is vectorized and represented based on BERT, the context of the news event text data set is more closely related, the expression mode is more accurate, the learning of the news text with large data volume in the network is realized by utilizing a bidirectional long and short memory network of an attention mechanism, the new event is found, the high-efficiency and accurate utilization of data is realized, a mode of combining a supervision method and an unsupervised method is adopted, the efficiency is higher than that of a single mode, the method is simple, semantic information can be extracted deeply, the news text in the network can be analyzed and mined, the discovery of the new event is realized, the real-time control of related supervision departments and individual users on the new event is facilitated, and the subsequent work is facilitated.
Description
The technical field is as follows:
the invention belongs to the technical field of network information, relates to a new event theme extraction method, and particularly relates to a method for extracting a new event theme by using a bidirectional long and short memory network training new event discovery model based on a BERT (basic transcription) and attention mechanism and a theme modeling analysis of multi-feature fusion.
Background art:
with the development of the internet in the big data age, people are surrounded by a great amount of news information with wide sources, such as newspapers, networks and the like, wherein the most common carrier of news is text, and the text is the most easily-accessible way for obtaining valuable information. Because news information modes generated from different sources are various, the formats and the contained information of news texts are often disordered, the quantity of the generated news information is extremely large, and the detection of Chinese news events is almost impossible by completely depending on manual work. Meanwhile, a large amount of texts in the network contain the attention degree and influence of people on a certain event, so that mining research aiming at the network news texts is beneficial to discovering hot attention events as soon as possible.
The method for discovering the hot news events is mainly based on a manual monitoring method, and the method needs higher resource cost in the news event discovery and monitoring in the network. With the rise of machine learning, the event discovery method generally adopted at present is realized according to a clustering method, and the method clusters the news text to discover a new event, but the accuracy of the method in the aspect of discovering the new event is not high, and error identification is easy to cause. With the rise of the neural network, the neural network has achieved huge achievements in various fields, and the neural network not only overcomes the limitation of artificially constructing features, but also is more suitable for big data. CN201810696452.6 provides a Chinese text subject sentence generation method facing to the field, which is characterized by comprising the following steps: the method comprises the steps of establishing a corresponding domain knowledge map facing a domain text data set, extracting semantic information of a text by applying a deep neural network model, classifying the text according to a topic sentence pattern, and finally generating a topic sentence of the text. However, this method has the following disadvantages: firstly, the method can only be oriented to specific field data sets and is not suitable for general data sets in various fields; secondly, the method needs to create a domain knowledge graph, which has huge resource overhead and needs high professional literacy; finally, the method labels and classifies the text data by using a deep learning method, and the operation only aims at a specific field and is poor in performance of a new data model in a new field. Therefore, it is necessary to provide a new event topic extraction method, which uses a deep learning method to discover a new event and uses a topic modeling method to extract a new event topic.
The invention content is as follows:
the invention aims to overcome the defects in the prior art, designs and provides a method for training a new event discovery model and analyzing and extracting a new event theme by multi-feature fusion theme modeling based on a bidirectional long and short memory network of a BERT (belief transfer) and attention mechanism, and realizes the mining and processing of mass text data by using a neural network in deep learning, thereby realizing the efficient and accurate analysis and utilization of the text data.
In order to achieve the above object, the process of extracting a new event topic according to the present invention comprises the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test;
step 2: outputting high-dimensional dense vector representation to the training set Train, the verification set Val and the Test set Test divided in the step 1 on the basis of a BERT representation model to obtain high-dimensional dense vector representation of a news event text data set, wherein the number of model layers of the BERT representation model is 12, the hidden size is 768, and the attention head is 12;
and step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, adopting Xavier to initialize neural network parameters according to a training set Train and a verification set Val, and adopting a dropout strategy and a gradient descent method as the updating of the neural network parameters and the input feature vectors to obtain a new event discovery model;
and 4, step 4: setting a threshold value of a new event discovery model, if the identification result is greater than the threshold value, judging that the event belongs to a known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises marks without substantial value, such as special characters, stop words and the like;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight value according to the calculation result;
and 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; and (3) combining the topic modeling analysis result with the expression of the word frequency/inverse document rate to the new event text set, extracting ten keywords from each event as the topic words of the new event, and completing the extraction of the new event topic.
The step 1 of the invention specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents such as spaces and the like, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: and (3) dividing the news event text set obtained in the step 1.3 into a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1.
Compared with the prior art, the method has the advantages that the text data set of the news event is vectorized and expressed based on the BERT, the context is more closely related, the expression mode is more accurate, the learning of the news text with large data volume in the network is realized by utilizing the bidirectional long and short memory network of the attention mechanism, the efficient and accurate utilization of the data is realized, the mode of combining the supervision and unsupervised methods is adopted, the efficiency is higher than that of a single mode, the method is simple, the semantic information can be extracted deeply, the news text in the network can be analyzed and mined, the discovery of the new event is realized, the real-time control of relevant supervision departments and individual users on the new event is facilitated, and the subsequent work is facilitated.
Description of the drawings:
fig. 1 is a schematic view of the working process of the present invention.
Fig. 2 is a diagram of a new event discovery model constructed in accordance with the present invention.
FIG. 3 is a diagram of an entity feature extraction model according to the present invention.
FIG. 4 is a flow chart of the inventive subject matter extraction process.
The specific implementation mode is as follows:
the invention is further described by way of example with reference to the accompanying drawings.
Example (b):
the process for extracting the new event theme in the embodiment of the invention comprises the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test, which specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents such as spaces and the like, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: for the news event text set obtained in the step 1.3, dividing a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1;
step 2: vectorizing the text based on a BERT representation model for the training set Train, the verification set Val and the Test set Test divided in the step 1, outputting high-dimensional dense vector representation, and obtaining the high-dimensional dense vector representation of the news event text data set, wherein the number of model layers of BERT representation model parameters is 12, the hidden size is 768, the attention head is 12, and the obtained high-dimensional dense vector representation dimension is 768, specifically: [8.3772335e-05,3.9696515e-05,3.854327e-05,0.0018235502,0.00028364992,3.3392924e-05,3.613378e-05,0.0011939545,8.937488e-06,0.00028550622,1.6984109e-06,0.014312873,4.2274103e-05,0.0057512685,0.008945758,2.318987e-05,1.9686187e-05,3.6920403e-05, … ]
And step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, initializing neural network parameters by using Xavier according to a training set Train and a verification set Val, and updating the neural network parameters and input feature vectors by using a dropout strategy and a gradient descent method to obtain a new event discovery model of a bidirectional long and short memory network based on a BERT and attention mechanism;
and 4, step 4: setting the threshold value of the new event discovery model to be 0.9, and if the identification result is greater than the threshold value, judging that the event belongs to the known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises preprocessing results obtained by marks without substantial values such as special characters, stop words and the like;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight vector according to the calculation result; the method comprises the following specific steps: 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.16767159442908067 …
And 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; combining the topic modeling analysis result with the expression of a word frequency/inverse document rate to a new event text set, extracting ten keywords from each event as a subject word of the new event, and completing the extraction of a new event topic, wherein the Kmeans new event topic extraction is a repeated iteration process and is divided into four steps, firstly, k objects in a news text set are selected as initial centers, and each object represents a cluster center; secondly, for the data objects in the sample, according to Euclidean distances between the data objects and the clustering centers, the data objects are classified into the class corresponding to the clustering center closest to the data objects according to the nearest principle; then, taking the mean value corresponding to all the objects in each category as the clustering center of the category, and calculating the value of the objective function; and finally, judging whether the values of the clustering center and the target function are changed or not, if not, outputting the result, and if so, returning to the second step. And finally, extracting keywords of each event category by combining the representation of the TF-IDF on the new event text after the clustering is finished.
Strategies, methods or algorithms not specifically described in this example are all available in the art.
Claims (2)
1. A new event theme extraction method is characterized by comprising the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test;
step 2: outputting high-dimensional dense vector representation to the training set Train, the verification set Val and the Test set Test divided in the step 1 on the basis of a BERT representation model to obtain high-dimensional dense vector representation of a news event text data set, wherein the number of model layers of the BERT representation model is 12, the hidden size is 768, and the attention head is 12;
and step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, adopting Xavier to initialize neural network parameters according to a training set Train and a verification set Val, and adopting a dropout strategy and a gradient descent method as the updating of the neural network parameters and the input feature vectors to obtain a new event discovery model;
and 4, step 4: setting a threshold value of a new event discovery model, if the identification result is greater than the threshold value, judging that the event belongs to a known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises special characters and marks of stop words without substantial value;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight value according to the calculation result;
and 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; and (3) combining the topic modeling analysis result with the expression of the word frequency/inverse document rate to the new event text set, extracting ten keywords from each event as the topic words of the new event, and completing the extraction of the new event topic.
2. The method for extracting a new event topic according to claim 1, wherein the step 1 specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents including spaces, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: and (3) dividing the news event text set obtained in the step 1.3 into a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010459853 | 2020-05-27 | ||
CN2020104598537 | 2020-05-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111597328A true CN111597328A (en) | 2020-08-28 |
CN111597328B CN111597328B (en) | 2022-10-18 |
Family
ID=72191626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010541567.5A Active CN111597328B (en) | 2020-05-27 | 2020-06-15 | New event theme extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597328B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112100038A (en) * | 2020-09-27 | 2020-12-18 | 北京有竹居网络技术有限公司 | Data delay monitoring method and device, electronic equipment and computer readable medium |
CN112199480A (en) * | 2020-09-18 | 2021-01-08 | 厦门快商通科技股份有限公司 | BERT model-based online dialog log violation detection method and system |
CN112269949A (en) * | 2020-10-19 | 2021-01-26 | 杭州叙简科技股份有限公司 | Information structuring method based on accident disaster news |
CN114841155A (en) * | 2022-04-21 | 2022-08-02 | 科技日报社 | Intelligent theme content aggregation method and device, electronic equipment and storage medium |
US20230096118A1 (en) * | 2021-09-27 | 2023-03-30 | Sap Se | Smart dataset collection system |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241610A (en) * | 2016-12-26 | 2018-07-03 | 上海神计信息系统工程有限公司 | A kind of online topic detection method and system of text flow |
CN108897857A (en) * | 2018-06-28 | 2018-11-27 | 东华大学 | The Chinese Text Topic sentence generating method of domain-oriented |
CN109635284A (en) * | 2018-11-26 | 2019-04-16 | 北京邮电大学 | Text snippet method and system based on deep learning associate cumulation attention mechanism |
CN109766544A (en) * | 2018-12-24 | 2019-05-17 | 中国科学院合肥物质科学研究院 | Document keyword abstraction method and device based on LDA and term vector |
CN110188172A (en) * | 2019-05-31 | 2019-08-30 | 清华大学 | Text based event detecting method, device, computer equipment and storage medium |
CN110245229A (en) * | 2019-04-30 | 2019-09-17 | 中山大学 | A kind of deep learning theme sensibility classification method based on data enhancing |
US10417350B1 (en) * | 2017-08-28 | 2019-09-17 | Amazon Technologies, Inc. | Artificial intelligence system for automated adaptation of text-based classification models for multiple languages |
CN110413863A (en) * | 2019-08-01 | 2019-11-05 | 信雅达系统工程股份有限公司 | A kind of public sentiment news duplicate removal and method for pushing based on deep learning |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
CN110781302A (en) * | 2019-10-23 | 2020-02-11 | 清华大学 | Method, device and equipment for processing event role in text and storage medium |
CN111078876A (en) * | 2019-12-04 | 2020-04-28 | 国家计算机网络与信息安全管理中心 | Short text classification method and system based on multi-model integration |
CN111143576A (en) * | 2019-12-18 | 2020-05-12 | 中科院计算技术研究所大数据研究院 | Event-oriented dynamic knowledge graph construction method and device |
CN111143549A (en) * | 2019-06-20 | 2020-05-12 | 东华大学 | Method for public sentiment emotion evolution based on theme |
-
2020
- 2020-06-15 CN CN202010541567.5A patent/CN111597328B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241610A (en) * | 2016-12-26 | 2018-07-03 | 上海神计信息系统工程有限公司 | A kind of online topic detection method and system of text flow |
US10417350B1 (en) * | 2017-08-28 | 2019-09-17 | Amazon Technologies, Inc. | Artificial intelligence system for automated adaptation of text-based classification models for multiple languages |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
CN108897857A (en) * | 2018-06-28 | 2018-11-27 | 东华大学 | The Chinese Text Topic sentence generating method of domain-oriented |
CN109635284A (en) * | 2018-11-26 | 2019-04-16 | 北京邮电大学 | Text snippet method and system based on deep learning associate cumulation attention mechanism |
CN109766544A (en) * | 2018-12-24 | 2019-05-17 | 中国科学院合肥物质科学研究院 | Document keyword abstraction method and device based on LDA and term vector |
CN110245229A (en) * | 2019-04-30 | 2019-09-17 | 中山大学 | A kind of deep learning theme sensibility classification method based on data enhancing |
CN110188172A (en) * | 2019-05-31 | 2019-08-30 | 清华大学 | Text based event detecting method, device, computer equipment and storage medium |
CN111143549A (en) * | 2019-06-20 | 2020-05-12 | 东华大学 | Method for public sentiment emotion evolution based on theme |
CN110413863A (en) * | 2019-08-01 | 2019-11-05 | 信雅达系统工程股份有限公司 | A kind of public sentiment news duplicate removal and method for pushing based on deep learning |
CN110781302A (en) * | 2019-10-23 | 2020-02-11 | 清华大学 | Method, device and equipment for processing event role in text and storage medium |
CN111078876A (en) * | 2019-12-04 | 2020-04-28 | 国家计算机网络与信息安全管理中心 | Short text classification method and system based on multi-model integration |
CN111143576A (en) * | 2019-12-18 | 2020-05-12 | 中科院计算技术研究所大数据研究院 | Event-oriented dynamic knowledge graph construction method and device |
Non-Patent Citations (3)
Title |
---|
张秀华 等: ""基于卷积神经网络和 K-means 的中文新闻事件检测与主题提取"", 《科学技术与工程》 * |
罗引: ""互联网舆情发现与观点挖掘技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
许强: ""基于Spark的话题检测与跟踪技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112199480A (en) * | 2020-09-18 | 2021-01-08 | 厦门快商通科技股份有限公司 | BERT model-based online dialog log violation detection method and system |
CN112199480B (en) * | 2020-09-18 | 2022-12-06 | 厦门快商通科技股份有限公司 | BERT model-based online dialog log violation detection method and system |
CN112100038A (en) * | 2020-09-27 | 2020-12-18 | 北京有竹居网络技术有限公司 | Data delay monitoring method and device, electronic equipment and computer readable medium |
CN112269949A (en) * | 2020-10-19 | 2021-01-26 | 杭州叙简科技股份有限公司 | Information structuring method based on accident disaster news |
CN112269949B (en) * | 2020-10-19 | 2023-09-22 | 杭州叙简科技股份有限公司 | Information structuring method based on accident disaster news |
US20230096118A1 (en) * | 2021-09-27 | 2023-03-30 | Sap Se | Smart dataset collection system |
US11874798B2 (en) * | 2021-09-27 | 2024-01-16 | Sap Se | Smart dataset collection system |
CN114841155A (en) * | 2022-04-21 | 2022-08-02 | 科技日报社 | Intelligent theme content aggregation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111597328B (en) | 2022-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111597328B (en) | New event theme extraction method | |
CN110134757B (en) | Event argument role extraction method based on multi-head attention mechanism | |
CN110598005A (en) | Public safety event-oriented multi-source heterogeneous data knowledge graph construction method | |
CN111291156A (en) | Question-answer intention identification method based on knowledge graph | |
CN111046670B (en) | Entity and relationship combined extraction method based on drug case legal documents | |
CN111985612B (en) | Encoder network model design method for improving video text description accuracy | |
CN110969023B (en) | Text similarity determination method and device | |
CN110175334A (en) | Text knowledge's extraction system and method based on customized knowledge slot structure | |
CN112836509A (en) | Expert system knowledge base construction method and system | |
CN105912525A (en) | Sentiment classification method for semi-supervised learning based on theme characteristics | |
CN111159332A (en) | Text multi-intention identification method based on bert | |
CN108536781B (en) | Social network emotion focus mining method and system | |
CN110287298A (en) | A kind of automatic question answering answer selection method based on question sentence theme | |
CN109543036A (en) | Text Clustering Method based on semantic similarity | |
CN114756678A (en) | Unknown intention text identification method and device | |
Wu et al. | Inferring users' emotions for human-mobile voice dialogue applications | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
CN116050419B (en) | Unsupervised identification method and system oriented to scientific literature knowledge entity | |
CN116451114A (en) | Internet of things enterprise classification system and method based on enterprise multisource entity characteristic information | |
CN111241812A (en) | Big data text clustering test method and system based on parallel improved K-means algorithm | |
CN112685374A (en) | Log classification method and device and electronic equipment | |
CN114065749A (en) | Text-oriented Guangdong language recognition model and training and recognition method of system | |
CN110162629B (en) | Text classification method based on multi-base model framework | |
CN114036289A (en) | Intention identification method, device, equipment and medium | |
CN113535928A (en) | Service discovery method and system of long-term and short-term memory network based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |