CN111597328A - New event theme extraction method - Google Patents

New event theme extraction method Download PDF

Info

Publication number
CN111597328A
CN111597328A CN202010541567.5A CN202010541567A CN111597328A CN 111597328 A CN111597328 A CN 111597328A CN 202010541567 A CN202010541567 A CN 202010541567A CN 111597328 A CN111597328 A CN 111597328A
Authority
CN
China
Prior art keywords
event
news
text
text data
new event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010541567.5A
Other languages
Chinese (zh)
Other versions
CN111597328B (en
Inventor
云红艳
贺英
张秀华
李正民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University
Original Assignee
Qingdao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University filed Critical Qingdao University
Publication of CN111597328A publication Critical patent/CN111597328A/en
Application granted granted Critical
Publication of CN111597328B publication Critical patent/CN111597328B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention belongs to the technical field of network information, and relates to a new event theme extraction method, wherein a news event text data set is vectorized and represented based on BERT, the context of the news event text data set is more closely related, the expression mode is more accurate, the learning of the news text with large data volume in the network is realized by utilizing a bidirectional long and short memory network of an attention mechanism, the new event is found, the high-efficiency and accurate utilization of data is realized, a mode of combining a supervision method and an unsupervised method is adopted, the efficiency is higher than that of a single mode, the method is simple, semantic information can be extracted deeply, the news text in the network can be analyzed and mined, the discovery of the new event is realized, the real-time control of related supervision departments and individual users on the new event is facilitated, and the subsequent work is facilitated.

Description

New event theme extraction method
The technical field is as follows:
the invention belongs to the technical field of network information, relates to a new event theme extraction method, and particularly relates to a method for extracting a new event theme by using a bidirectional long and short memory network training new event discovery model based on a BERT (basic transcription) and attention mechanism and a theme modeling analysis of multi-feature fusion.
Background art:
with the development of the internet in the big data age, people are surrounded by a great amount of news information with wide sources, such as newspapers, networks and the like, wherein the most common carrier of news is text, and the text is the most easily-accessible way for obtaining valuable information. Because news information modes generated from different sources are various, the formats and the contained information of news texts are often disordered, the quantity of the generated news information is extremely large, and the detection of Chinese news events is almost impossible by completely depending on manual work. Meanwhile, a large amount of texts in the network contain the attention degree and influence of people on a certain event, so that mining research aiming at the network news texts is beneficial to discovering hot attention events as soon as possible.
The method for discovering the hot news events is mainly based on a manual monitoring method, and the method needs higher resource cost in the news event discovery and monitoring in the network. With the rise of machine learning, the event discovery method generally adopted at present is realized according to a clustering method, and the method clusters the news text to discover a new event, but the accuracy of the method in the aspect of discovering the new event is not high, and error identification is easy to cause. With the rise of the neural network, the neural network has achieved huge achievements in various fields, and the neural network not only overcomes the limitation of artificially constructing features, but also is more suitable for big data. CN201810696452.6 provides a Chinese text subject sentence generation method facing to the field, which is characterized by comprising the following steps: the method comprises the steps of establishing a corresponding domain knowledge map facing a domain text data set, extracting semantic information of a text by applying a deep neural network model, classifying the text according to a topic sentence pattern, and finally generating a topic sentence of the text. However, this method has the following disadvantages: firstly, the method can only be oriented to specific field data sets and is not suitable for general data sets in various fields; secondly, the method needs to create a domain knowledge graph, which has huge resource overhead and needs high professional literacy; finally, the method labels and classifies the text data by using a deep learning method, and the operation only aims at a specific field and is poor in performance of a new data model in a new field. Therefore, it is necessary to provide a new event topic extraction method, which uses a deep learning method to discover a new event and uses a topic modeling method to extract a new event topic.
The invention content is as follows:
the invention aims to overcome the defects in the prior art, designs and provides a method for training a new event discovery model and analyzing and extracting a new event theme by multi-feature fusion theme modeling based on a bidirectional long and short memory network of a BERT (belief transfer) and attention mechanism, and realizes the mining and processing of mass text data by using a neural network in deep learning, thereby realizing the efficient and accurate analysis and utilization of the text data.
In order to achieve the above object, the process of extracting a new event topic according to the present invention comprises the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test;
step 2: outputting high-dimensional dense vector representation to the training set Train, the verification set Val and the Test set Test divided in the step 1 on the basis of a BERT representation model to obtain high-dimensional dense vector representation of a news event text data set, wherein the number of model layers of the BERT representation model is 12, the hidden size is 768, and the attention head is 12;
and step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, adopting Xavier to initialize neural network parameters according to a training set Train and a verification set Val, and adopting a dropout strategy and a gradient descent method as the updating of the neural network parameters and the input feature vectors to obtain a new event discovery model;
and 4, step 4: setting a threshold value of a new event discovery model, if the identification result is greater than the threshold value, judging that the event belongs to a known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises marks without substantial value, such as special characters, stop words and the like;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight value according to the calculation result;
and 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; and (3) combining the topic modeling analysis result with the expression of the word frequency/inverse document rate to the new event text set, extracting ten keywords from each event as the topic words of the new event, and completing the extraction of the new event topic.
The step 1 of the invention specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents such as spaces and the like, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: and (3) dividing the news event text set obtained in the step 1.3 into a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1.
Compared with the prior art, the method has the advantages that the text data set of the news event is vectorized and expressed based on the BERT, the context is more closely related, the expression mode is more accurate, the learning of the news text with large data volume in the network is realized by utilizing the bidirectional long and short memory network of the attention mechanism, the efficient and accurate utilization of the data is realized, the mode of combining the supervision and unsupervised methods is adopted, the efficiency is higher than that of a single mode, the method is simple, the semantic information can be extracted deeply, the news text in the network can be analyzed and mined, the discovery of the new event is realized, the real-time control of relevant supervision departments and individual users on the new event is facilitated, and the subsequent work is facilitated.
Description of the drawings:
fig. 1 is a schematic view of the working process of the present invention.
Fig. 2 is a diagram of a new event discovery model constructed in accordance with the present invention.
FIG. 3 is a diagram of an entity feature extraction model according to the present invention.
FIG. 4 is a flow chart of the inventive subject matter extraction process.
The specific implementation mode is as follows:
the invention is further described by way of example with reference to the accompanying drawings.
Example (b):
the process for extracting the new event theme in the embodiment of the invention comprises the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test, which specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents such as spaces and the like, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: for the news event text set obtained in the step 1.3, dividing a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1;
step 2: vectorizing the text based on a BERT representation model for the training set Train, the verification set Val and the Test set Test divided in the step 1, outputting high-dimensional dense vector representation, and obtaining the high-dimensional dense vector representation of the news event text data set, wherein the number of model layers of BERT representation model parameters is 12, the hidden size is 768, the attention head is 12, and the obtained high-dimensional dense vector representation dimension is 768, specifically: [8.3772335e-05,3.9696515e-05,3.854327e-05,0.0018235502,0.00028364992,3.3392924e-05,3.613378e-05,0.0011939545,8.937488e-06,0.00028550622,1.6984109e-06,0.014312873,4.2274103e-05,0.0057512685,0.008945758,2.318987e-05,1.9686187e-05,3.6920403e-05, … ]
And step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, initializing neural network parameters by using Xavier according to a training set Train and a verification set Val, and updating the neural network parameters and input feature vectors by using a dropout strategy and a gradient descent method to obtain a new event discovery model of a bidirectional long and short memory network based on a BERT and attention mechanism;
and 4, step 4: setting the threshold value of the new event discovery model to be 0.9, and if the identification result is greater than the threshold value, judging that the event belongs to the known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises preprocessing results obtained by marks without substantial values such as special characters, stop words and the like;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight vector according to the calculation result; the method comprises the following specific steps: 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.11178106295272044, 0.16767159442908067 …
And 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; combining the topic modeling analysis result with the expression of a word frequency/inverse document rate to a new event text set, extracting ten keywords from each event as a subject word of the new event, and completing the extraction of a new event topic, wherein the Kmeans new event topic extraction is a repeated iteration process and is divided into four steps, firstly, k objects in a news text set are selected as initial centers, and each object represents a cluster center; secondly, for the data objects in the sample, according to Euclidean distances between the data objects and the clustering centers, the data objects are classified into the class corresponding to the clustering center closest to the data objects according to the nearest principle; then, taking the mean value corresponding to all the objects in each category as the clustering center of the category, and calculating the value of the objective function; and finally, judging whether the values of the clustering center and the target function are changed or not, if not, outputting the result, and if so, returning to the second step. And finally, extracting keywords of each event category by combining the representation of the TF-IDF on the new event text after the clustering is finished.
Strategies, methods or algorithms not specifically described in this example are all available in the art.

Claims (2)

1. A new event theme extraction method is characterized by comprising the following steps:
step 1: acquiring a news event text data stream according to event keywords, constructing a news event text data set according to the acquired news event text data stream, wherein each record in the text comprises an event type label of a news text and a specific text description of an event, and dividing the news event text data set into a training set Train, a verification set Val and a Test set Test;
step 2: outputting high-dimensional dense vector representation to the training set Train, the verification set Val and the Test set Test divided in the step 1 on the basis of a BERT representation model to obtain high-dimensional dense vector representation of a news event text data set, wherein the number of model layers of the BERT representation model is 12, the hidden size is 768, and the attention head is 12;
and step 3: taking the high-dimensional dense vector representation of the news event text data set obtained in the step 2 as input, adopting Xavier to initialize neural network parameters according to a training set Train and a verification set Val, and adopting a dropout strategy and a gradient descent method as the updating of the neural network parameters and the input feature vectors to obtain a new event discovery model;
and 4, step 4: setting a threshold value of a new event discovery model, if the identification result is greater than the threshold value, judging that the event belongs to a known news event type and giving the subject of the event; if the prediction result threshold is smaller than the set threshold, the event is judged to be a new event, and the news text judged to be the new event is integrated and stored to obtain a new event text data set;
and 5: removing useless information contained in the new event text data set obtained in the step 4, keeping the description content of the news event text to the news event, and establishing a custom dictionary to improve the word segmentation precision after performing word segmentation by adopting a Chinese word segmentation tool; the useless information comprises special characters and marks of stop words without substantial value;
step 6: extracting entity characteristics and LDA subject hot word characteristics from the preprocessed new event text data set obtained in the step 5, performing word-level splicing with the original text to form new news text description, and performing weighted representation on the entity characteristics and the LDA subject hot word characteristics in a mode of increasing word frequency of the characteristics; the entity characteristics comprise a person entity characteristic, a place entity characteristic and an organization name entity characteristic;
and 7: for the news text data set processed in the step 6, calculating the word frequency/inverse document rate of each word to measure the importance of each word relative to the current theme, and endowing each word with a corresponding weight value according to the calculation result;
and 8: clustering the new event text data set obtained in the step 7 according to a plurality of events by using a Kmeans algorithm according to the characteristics and the weighted values thereof obtained in the steps 6 and 7, and performing topic modeling analysis on the new events; and (3) combining the topic modeling analysis result with the expression of the word frequency/inverse document rate to the new event text set, extracting ten keywords from each event as the topic words of the new event, and completing the extraction of the new event topic.
2. The method for extracting a new event topic according to claim 1, wherein the step 1 specifically comprises the following steps:
step 1.1: determining keywords of a specific news event according to the news event text data acquisition requirement;
step 1.2: for the determined news event keywords, constructing a data crawler system for acquiring news event text data links by a Baidu search engine on the basis of a Scapy frame, and acquiring news event text data streams;
step 1.3: carrying out standardization operation on text contents for the obtained news event text data stream, removing invalid contents including spaces, and splicing the remaining valid contents to form a standardized representation recorded as a news text to form a news event text set;
step 1.4: and (3) dividing the news event text set obtained in the step 1.3 into a training set Train, a verification set Val and a Test set Test according to the ratio of 7:2: 1.
CN202010541567.5A 2020-05-27 2020-06-15 New event theme extraction method Active CN111597328B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010459853 2020-05-27
CN2020104598537 2020-05-27

Publications (2)

Publication Number Publication Date
CN111597328A true CN111597328A (en) 2020-08-28
CN111597328B CN111597328B (en) 2022-10-18

Family

ID=72191626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010541567.5A Active CN111597328B (en) 2020-05-27 2020-06-15 New event theme extraction method

Country Status (1)

Country Link
CN (1) CN111597328B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112100038A (en) * 2020-09-27 2020-12-18 北京有竹居网络技术有限公司 Data delay monitoring method and device, electronic equipment and computer readable medium
CN112199480A (en) * 2020-09-18 2021-01-08 厦门快商通科技股份有限公司 BERT model-based online dialog log violation detection method and system
CN112269949A (en) * 2020-10-19 2021-01-26 杭州叙简科技股份有限公司 Information structuring method based on accident disaster news
CN114841155A (en) * 2022-04-21 2022-08-02 科技日报社 Intelligent theme content aggregation method and device, electronic equipment and storage medium
US20230096118A1 (en) * 2021-09-27 2023-03-30 Sap Se Smart dataset collection system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241610A (en) * 2016-12-26 2018-07-03 上海神计信息系统工程有限公司 A kind of online topic detection method and system of text flow
CN108897857A (en) * 2018-06-28 2018-11-27 东华大学 The Chinese Text Topic sentence generating method of domain-oriented
CN109635284A (en) * 2018-11-26 2019-04-16 北京邮电大学 Text snippet method and system based on deep learning associate cumulation attention mechanism
CN109766544A (en) * 2018-12-24 2019-05-17 中国科学院合肥物质科学研究院 Document keyword abstraction method and device based on LDA and term vector
CN110188172A (en) * 2019-05-31 2019-08-30 清华大学 Text based event detecting method, device, computer equipment and storage medium
CN110245229A (en) * 2019-04-30 2019-09-17 中山大学 A kind of deep learning theme sensibility classification method based on data enhancing
US10417350B1 (en) * 2017-08-28 2019-09-17 Amazon Technologies, Inc. Artificial intelligence system for automated adaptation of text-based classification models for multiple languages
CN110413863A (en) * 2019-08-01 2019-11-05 信雅达系统工程股份有限公司 A kind of public sentiment news duplicate removal and method for pushing based on deep learning
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method
CN110781302A (en) * 2019-10-23 2020-02-11 清华大学 Method, device and equipment for processing event role in text and storage medium
CN111078876A (en) * 2019-12-04 2020-04-28 国家计算机网络与信息安全管理中心 Short text classification method and system based on multi-model integration
CN111143576A (en) * 2019-12-18 2020-05-12 中科院计算技术研究所大数据研究院 Event-oriented dynamic knowledge graph construction method and device
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241610A (en) * 2016-12-26 2018-07-03 上海神计信息系统工程有限公司 A kind of online topic detection method and system of text flow
US10417350B1 (en) * 2017-08-28 2019-09-17 Amazon Technologies, Inc. Artificial intelligence system for automated adaptation of text-based classification models for multiple languages
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method
CN108897857A (en) * 2018-06-28 2018-11-27 东华大学 The Chinese Text Topic sentence generating method of domain-oriented
CN109635284A (en) * 2018-11-26 2019-04-16 北京邮电大学 Text snippet method and system based on deep learning associate cumulation attention mechanism
CN109766544A (en) * 2018-12-24 2019-05-17 中国科学院合肥物质科学研究院 Document keyword abstraction method and device based on LDA and term vector
CN110245229A (en) * 2019-04-30 2019-09-17 中山大学 A kind of deep learning theme sensibility classification method based on data enhancing
CN110188172A (en) * 2019-05-31 2019-08-30 清华大学 Text based event detecting method, device, computer equipment and storage medium
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme
CN110413863A (en) * 2019-08-01 2019-11-05 信雅达系统工程股份有限公司 A kind of public sentiment news duplicate removal and method for pushing based on deep learning
CN110781302A (en) * 2019-10-23 2020-02-11 清华大学 Method, device and equipment for processing event role in text and storage medium
CN111078876A (en) * 2019-12-04 2020-04-28 国家计算机网络与信息安全管理中心 Short text classification method and system based on multi-model integration
CN111143576A (en) * 2019-12-18 2020-05-12 中科院计算技术研究所大数据研究院 Event-oriented dynamic knowledge graph construction method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张秀华 等: ""基于卷积神经网络和 K-means 的中文新闻事件检测与主题提取"", 《科学技术与工程》 *
罗引: ""互联网舆情发现与观点挖掘技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
许强: ""基于Spark的话题检测与跟踪技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199480A (en) * 2020-09-18 2021-01-08 厦门快商通科技股份有限公司 BERT model-based online dialog log violation detection method and system
CN112199480B (en) * 2020-09-18 2022-12-06 厦门快商通科技股份有限公司 BERT model-based online dialog log violation detection method and system
CN112100038A (en) * 2020-09-27 2020-12-18 北京有竹居网络技术有限公司 Data delay monitoring method and device, electronic equipment and computer readable medium
CN112269949A (en) * 2020-10-19 2021-01-26 杭州叙简科技股份有限公司 Information structuring method based on accident disaster news
CN112269949B (en) * 2020-10-19 2023-09-22 杭州叙简科技股份有限公司 Information structuring method based on accident disaster news
US20230096118A1 (en) * 2021-09-27 2023-03-30 Sap Se Smart dataset collection system
US11874798B2 (en) * 2021-09-27 2024-01-16 Sap Se Smart dataset collection system
CN114841155A (en) * 2022-04-21 2022-08-02 科技日报社 Intelligent theme content aggregation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111597328B (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN111597328B (en) New event theme extraction method
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN111291156A (en) Question-answer intention identification method based on knowledge graph
CN111046670B (en) Entity and relationship combined extraction method based on drug case legal documents
CN111985612B (en) Encoder network model design method for improving video text description accuracy
CN110969023B (en) Text similarity determination method and device
CN110175334A (en) Text knowledge's extraction system and method based on customized knowledge slot structure
CN112836509A (en) Expert system knowledge base construction method and system
CN105912525A (en) Sentiment classification method for semi-supervised learning based on theme characteristics
CN111159332A (en) Text multi-intention identification method based on bert
CN108536781B (en) Social network emotion focus mining method and system
CN110287298A (en) A kind of automatic question answering answer selection method based on question sentence theme
CN109543036A (en) Text Clustering Method based on semantic similarity
CN114756678A (en) Unknown intention text identification method and device
Wu et al. Inferring users' emotions for human-mobile voice dialogue applications
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN116050419B (en) Unsupervised identification method and system oriented to scientific literature knowledge entity
CN116451114A (en) Internet of things enterprise classification system and method based on enterprise multisource entity characteristic information
CN111241812A (en) Big data text clustering test method and system based on parallel improved K-means algorithm
CN112685374A (en) Log classification method and device and electronic equipment
CN114065749A (en) Text-oriented Guangdong language recognition model and training and recognition method of system
CN110162629B (en) Text classification method based on multi-base model framework
CN114036289A (en) Intention identification method, device, equipment and medium
CN113535928A (en) Service discovery method and system of long-term and short-term memory network based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant