WO2022134794A1 - 新闻事件的舆情处理方法及装置、存储介质、计算机设备 - Google Patents

新闻事件的舆情处理方法及装置、存储介质、计算机设备 Download PDF

Info

Publication number
WO2022134794A1
WO2022134794A1 PCT/CN2021/124890 CN2021124890W WO2022134794A1 WO 2022134794 A1 WO2022134794 A1 WO 2022134794A1 CN 2021124890 W CN2021124890 W CN 2021124890W WO 2022134794 A1 WO2022134794 A1 WO 2022134794A1
Authority
WO
WIPO (PCT)
Prior art keywords
news
public opinion
classification
event
text
Prior art date
Application number
PCT/CN2021/124890
Other languages
English (en)
French (fr)
Inventor
赵亮
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022134794A1 publication Critical patent/WO2022134794A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method and device for processing public opinion of news events, a storage medium, and a computer device.
  • the existing public opinion systems generally use the content of online news to directly collect public opinion information, and manually analyze and process the crawled public opinion information, so as to mine useful information for news content.
  • the news content of various Internet websites is too scattered, and different types of enterprises cannot meet their own needs and quickly find the information they need.
  • Professional and classified personnel are required to conduct long-term and high-intensity data sorting, analysis and processing to effectively obtain the required information.
  • content so that processing the collected public opinion information containing news for a long time will affect the timeliness of the information, affect the impact of public opinion information on the enterprise, and consume a lot of human resources, thus reducing the company's response to news content. the processing efficiency of public opinion information.
  • the present application provides a public opinion processing method and device, storage medium, and computer equipment for news events, mainly to solve the problem of low efficiency of public opinion processing of existing news events.
  • a public opinion processing method for news events including:
  • the news public opinion information is information including the text content of each news event
  • the first classification mark is automatically marked and determined from the training sample sets of different public opinion requirements during the training of the first text classification model;
  • a public opinion processing apparatus for news events comprising:
  • an acquisition module used for acquiring the collected news public opinion information
  • the news public opinion information is the information containing the text content of each news event
  • a first processing module configured to perform first classification processing on the news public opinion information according to the first text classification model that has been trained
  • the second processing module is configured to extract a second text classification model that matches the first classification mark obtained by the first classification process and has completed training, and matches the first text classification model according to the second text classification model pair.
  • the classified and marked news public opinion information is subjected to a second classification process, and the first classification mark is determined by automatic marking from different public opinion demand training sample sets during the training process of the first text classification model;
  • the output module is configured to extract news event content from the news public opinion information determined by the second classification process to determine the second classification mark, and combine the first classification mark and the second classification mark to classify the news event
  • the content is mapped to the corresponding node in the event graph matching the public opinion demand, and output is performed.
  • a storage medium stores at least one executable instruction, and the executable instruction causes a processor to perform operations corresponding to the above-mentioned method for processing public opinion on news events.
  • a computer device comprising: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus. communication;
  • the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the public opinion processing method of the news event.
  • the present application reduces the consumption of human resources and improves the efficiency of processing news and public opinions, thereby improving the efficiency of processing public opinions of news events.
  • FIG. 1 shows a flowchart of a public opinion processing method for a news event provided by an embodiment of the present application
  • FIG. 2 shows a block diagram of the composition of a public opinion processing apparatus for a news event provided by an embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the embodiment of the present application provides a public opinion processing method for news events, as shown in FIG. 1 , the method includes:
  • the news public opinion information is information including the text content of each news event, which may include the entire text of the news event or a part of the text of the news event, and is collected through the public opinion system.
  • the embodiment of the present application does not specifically limit it.
  • the collected news and public opinion information in step 101 is obtained after storing the news and public opinion information collected based on the public opinion system. Processing of public opinion information.
  • the first text classification model may be any machine learning model with a classification function, for example, a neural network model or a support vector machine model, which is not specifically limited in this embodiment of the present application.
  • the first text classification model is based on the classification of each text feature in the news public opinion information. Therefore, before the first classification processing, it is necessary to perform natural language processing on the news public opinion information, and after converting it into word vectors, based on The first text classification model that has completed training is classified, which is not specifically limited in this embodiment of the present application.
  • the obtained classification will be classified and marked at the same time.
  • the classification according to each news word in a news event may include business opportunity marking, risk marking, competition marking, etc.
  • the labeled classification label is the one that has been labeled in the training sample set when the first text classification model is trained, so that the above classification label is obtained when the obtained news and public opinion information is classified for the first time. Therefore, the first classification process in the embodiment of the present application is limited to a classification process according to a wide range.
  • the information undergoes a second classification process.
  • the first classification mark can be obtained, such as the business opportunity type mark, the risk type mark, the competition type mark and the like in step 102 .
  • different classification tags are pre-matched to the second text classification model for training, so that a small-scale detailed classification is performed again under the classification of a large-scale program.
  • the second text classification model may be any machine learning model, which may be the same as or different from the first text classification model, for example, may be a neural network model, a support vector machine model, or the like, which is not specifically limited in the embodiments of the present application .
  • the training is performed based on the training sample sets corresponding to different first classification tags, so that the corresponding second text classification model can be matched based on the first classification tags, and the corresponding second text classification model can be matched based on the matching
  • the second text classification model of the second classification process is performed on the news public opinion information marked as the first classification mark.
  • the first classification label is determined automatically from the training sample sets of different public opinion requirements during the training of the first text classification model, that is, for Opportunity tags, risk tags, and competition tags are automatically tagged from the training sample set based on different public opinion requirements, that is, after the clustering algorithm is used to determine the clustering features based on the public opinion requirements, the news event samples in the training sample set are marked.
  • the embodiments of the present application do not make specific limitations.
  • the news public opinion information of the second classification mark is obtained by classifying again based on the first classification mark, that is, a first classification mark can be classified into multiple second classification marks, for example, as the first classification mark
  • the second category tags corresponding to the business opportunity category tags include enterprise-related new technology categories, enterprise merger news categories, new market and new sales channel discovery categories, etc., which are not specifically limited in the embodiments of this application. Therefore, the final news and public opinion information can have two classification tags, and the content of the news event is extracted and mapped to the nodes in the event graph for output.
  • an event graph is an event logic knowledge base, which describes the evolution laws and patterns between events.
  • an event graph is a directed and cyclic graph, in which nodes represent events, and directed edges represent logical relationships between events, such as succession, causality, condition, and superiority. Therefore, in order to accurately and efficiently present the event content of news events corresponding to different classification identifiers to users, and to achieve the purpose of news early warning, the event content is mapped to the corresponding nodes of the event graph in a logical manner.
  • the training method for the first text classification model is further limited, and the method further includes: constructing a three-layer convolutional neural network model, based on the preset three
  • the kernel feature value extracts feature information from the news text content that has completed the first classification mark in the training sample set; performs feature screening on the feature information based on the pooling layer, splices the filtered feature vectors, and uses the training sample set.
  • the three-layer convolutional neural network model is trained; the adam optimizer is used to optimize the three-layer convolutional neural network model in the training process, until the training of the three-layer convolutional neural network model is completed, and the first text classification model.
  • a convolutional neural network model is selected as the first text classification model to be trained, a three-layer convolutional neural network model is constructed, and features are extracted from the training sample news text by using kernel feature extraction.
  • the model includes three-dimensional input layer, convolution layer, pooling layer pooling, fully connected layer dense, and output layer; the features extracted by the three convolution kernels are processed by maxpooling Feature screening, and splicing feature vectors.
  • maxPooling extracts several feature values from one of the Filter convolution layers, and only obtains the largest pooling Pooling layer as the reserved value, and discards all other feature values.
  • the method further includes: obtaining the news text content to be marked. , and determine the public opinion demand; determine the k value in the K-means clustering according to the public opinion demand, cluster the news text content, and extract the number of occurrences of text words in different clusters after the clustering exceeds the preset number
  • the characteristic words of the threshold value are used as the first classification and marking content; the first marking classification is performed on the different cluster clusters that have completed the clustering based on the first classification and marking content.
  • the news text content to be marked is stored in the training sample set as the text content to be trained, and the public opinion demand is the number of categories directly entered by the user, such as the number of two categories such as competition and xx companies, to determine the k value, so as to combine The public opinion needs to automatically realize the mark.
  • the clustering is completed, different clusters are obtained, which can be determined as the text content of each category of classification, and the identification corresponding to the text content needs to be further determined.
  • the risk is determined as such a classification mark , which is not specifically limited in the embodiments of the present application.
  • the specific method of using k-means clustering includes: determining the k value according to the needs of public opinion; randomly selecting k data points from the data set as the centroid; for each point in the data set, calculating the distance between it and each centroid (such as Euclidean distance), whichever centroid is close to the centroid is divided into the set to which the centroid belongs; after all the data are grouped into a set, there are a total of k sets.
  • a training method for the first text classification model is further defined, and the method further includes: constructing a two-layer convolutional neural network model, based on two preset
  • the kernel feature value extracts feature information from the news text content of the second category tag belonging to the first category tag in the training sample set, wherein different first category tags match at least one different second category tags; based on pooling
  • the layer performs feature screening on the feature information, splices the screened feature vectors, and uses the training sample set to train the two-layer convolutional neural network model; uses the adam optimizer to perform the training process.
  • the convolutional neural network model is optimized until the training of the two-layer convolutional neural network model is completed, and a second text classification model is obtained.
  • a convolutional neural network model is selected as the second text classification model to be trained, a two-layer convolutional neural network model is constructed, and features are extracted from the training sample news text by using kernel feature extraction.
  • the model includes two-dimensional input layer, convolution layer, pooling layer pooling, fully connected layer dense, and output layer; the features extracted by the two convolution kernels are filtered through maxpooling, and The feature vectors are spliced; then after a dense layer, the dropout rate of each neural unit is 0.2, and the activation function is softmax for three-classification; finally, the CNN model is optimized by the adam optimizer, and the optimized learning rate is set. 0.0001, which is the same as the optimization method in the training process of the first text classification model, and will not be repeated here.
  • the method further includes: receiving the entered public opinion keywords, the public opinion The keyword is associated with the public opinion demand; from the public opinion information database collected within a preset time interval, the matching news public opinion information is searched according to the public opinion keyword; when the found news public opinion information matches the public opinion keyword. If the number exceeds the preset threshold, the news public opinion information is determined as the collected news public opinion information.
  • the public opinion keywords are associated with public opinion requirements. For example, if the input public opinion requirements are 3, the keywords can be more than 3, thereby improving the accuracy of public opinion processing.
  • the public opinion system stores the collected news and public opinion information in the public opinion information database. When public opinion processing is required, it collects news and public opinion information in the public opinion information database at preset time intervals, and searches for matching news and public opinion information in combination with public opinion keywords. , for example, to find all news and public opinion information that contains word risk based on keyword risk.
  • the news public opinion information 1 is used as the collected news public opinion information, which is not specifically limited in this embodiment of the present application.
  • the combination of the first classification mark and the second classification mark is used to map the content of the news event to the requirements of the public opinion
  • the corresponding nodes in the matched affair graph include: defining the events of the affair graph based on the public opinion keywords, and extracting the event content from the news event content; extracting the event content according to chronological order, causal relationship, and upper-lower relationship Each node and node relationship of the event relationship is established according to the time sequence, the causal relationship, and the upper-lower relationship, and an event map is constructed; The news event content marked by the binary classification is written at the node corresponding to the event content in the event graph.
  • defining the events of the event map based on the public opinion keywords is to determine the words of the first-level events that need to be constructed. It is the content of the event corresponding to the relevant news and public opinion information surrounding the competition.
  • the event content can be extracted from the news event content through natural language processing technology, which refers to the content that contains the core of the entire news event. For example, if the content of the news event is a malicious competition event sold by three business giants against Apple in a certain place, then the content of the news event can be extracted from the content of the news event.
  • the content of the extracted event is malicious competition in Apple sales, which is not specifically limited in this embodiment of the present application.
  • each node is used to store a piece of news and public opinion information.
  • each layer of network nodes is used in chronological order, and the nodes that store the event content of each news public opinion information are connected in sequence according to causal relationship and upper-lower relationship, which is not specifically limited in this embodiment of the present application.
  • the method further includes: when a query request for any node in the event graph is received, query according to a preset The event content of each node that has a node relationship with the node is counted and output at the hierarchical level.
  • the user can trigger a query request for any node in the event picture based on the mouse, and when the front end receives the query request, it counts the event content of each node that has a node relationship with the queried node according to the preset query level, and outputs it.
  • the preset query level is a level that can query the previous node and the next node based on the node, preferably two nodes up and two nodes down, and then the event content in each node is counted for output.
  • a node that has a node relationship with a node is a node that can be searched through upper and lower levels, which is not specifically limited in this embodiment of the present application.
  • the embodiment of the present application provides a public opinion processing method for news events.
  • the embodiment of the present application obtains the news public opinion information that has been collected, and the news public opinion information is information including the text content of each news event;
  • a text classification model performs the first classification process on the news public opinion information; extracts a second text classification model that matches the first classification mark obtained by the first classification process and has completed training, and according to the first classification process
  • the second text classification model performs a second classification process on the news public opinion information matching the first classification mark, and the first classification mark is automatically selected from the training sample set of different public opinion requirements during the training process of the first text classification model.
  • the tag is determined; the news event content is extracted from the news public opinion information for which the second classification tag is determined by the second classification process, and the news event content is combined with the first classification tag and the second classification tag. It is mapped to the corresponding node in the event graph that matches the public opinion demand, and output is performed to meet the needs of different enterprise users for accurate news and public opinion processing, which greatly reduces the consumption of human resources and improves the efficiency of news and public opinion processing. Improve the efficiency of public opinion processing of news events.
  • an embodiment of the present application provides a public opinion processing device for news events, as shown in FIG. 2, the device includes:
  • the acquisition module 21 is used for acquiring the collected news public opinion information, the news public opinion information is information including the text content of each news event;
  • the first processing module 22 is configured to perform the first classification processing on the news public opinion information according to the first text classification model that has been trained;
  • the second processing module 23 is configured to extract a second text classification model that matches the first classification mark obtained by the first classification process and has completed training, and matches the second text classification model according to the second text classification model.
  • a classified and marked news public opinion information is subjected to a second classification process, and the first classification mark is determined by automatic marking from different public opinion demand training sample sets during the training of the first text classification model;
  • the output module 24 is configured to extract news event content from the news public opinion information determined by the second classification process to determine the second classification mark, and combine the first classification mark and the second classification mark to classify the news.
  • the content of the event is mapped to the corresponding node in the event graph matching the public opinion requirement, and output is performed.
  • the device also includes:
  • the building module is used to construct a three-layer convolutional neural network model, and based on the preset three kernel feature values, feature information is extracted from the news text content that has completed the first classification mark in the training sample set;
  • a training module configured to perform feature screening on the feature information based on the pooling layer, splicing the screened feature vectors, and using the training sample set to train the three-layer convolutional neural network model;
  • the optimization module is configured to use the adam optimizer to optimize the three-layer convolutional neural network model in the training process, until the training of the three-layer convolutional neural network model is completed, and a first text classification model is obtained.
  • the device also includes:
  • the first determination module is used to obtain the content of the news text to be marked and determine the public opinion demand
  • the clustering module is used for determining the k value in the K-means clustering according to the public opinion demand, clustering the news text content, and extracting the number of occurrences of text words in different clusters after the clustering that exceeds a preset number Threshold feature words, as the first classification mark content,;
  • a labeling module configured to perform a first label classification on the different cluster clusters that have completed the clustering based on the label content of the first classification.
  • the building module is also used to build a two-layer convolutional neural network model, based on the preset two kernel feature values, from the training sample set belonging to the second classification marked news text content of the first classification mark. extracting feature information from , wherein different first classification marks match at least one different second classification mark;
  • the training module is further configured to perform feature screening on the feature information based on the pooling layer, splicing the screened feature vectors, and using the training sample set to train the two-layer convolutional neural network model;
  • the optimization module is further configured to use the adam optimizer to optimize the two-layer convolutional neural network model in the training process, until the training of the two-layer convolutional neural network model is completed, and a second text classification model is obtained.
  • the device also includes:
  • a receiving module configured to receive the entered public opinion keywords, where the public opinion keywords are associated with the public opinion requirements
  • a search module configured to search for matching news public opinion information according to the public opinion keywords from the public opinion information database collected within a preset time interval;
  • the second determination module is configured to determine the news public opinion information as the collected news public opinion information when the number of matching public opinion keywords in the found news public opinion information exceeds a preset threshold.
  • the output module includes:
  • an extraction unit used for defining events of the event map based on the public opinion keywords, and extracting event content from the news event content
  • a computing unit used for extracting the eventual relationship in the event content according to time sequence, causal relationship, and upper-lower relationship
  • a construction unit configured to establish each node and node relationship of the event relationship according to the time sequence, the causal relationship, and the upper-lower relationship, and construct an event map
  • a writing unit configured to write the news event content marked with the first classification mark and the second classification mark into the node corresponding to the event content in the event graph.
  • the device also includes:
  • the statistics module is configured to, when receiving a query request for any node in the event graph, count the event content of each node having a node relationship with the node according to a preset query level, and output it.
  • the embodiment of the present application provides a public opinion processing device for news events.
  • the embodiment of the present application obtains the news public opinion information that has been collected, and the news public opinion information is information including the text content of each news event;
  • a text classification model performs the first classification process on the news public opinion information; extracts a second text classification model that matches the first classification mark obtained by the first classification process and has completed training, and according to the first classification process
  • the second text classification model performs a second classification process on the news public opinion information matching the first classification mark, and the first classification mark is automatically selected from the training sample set of different public opinion requirements during the training process of the first text classification model.
  • the tag is determined; the news event content is extracted from the news public opinion information for which the second classification tag is determined by the second classification process, and the news event content is combined with the first classification tag and the second classification tag. It is mapped to the corresponding node in the event graph that matches the public opinion demand, and output is performed to meet the needs of different enterprise users for accurate news and public opinion processing, which greatly reduces the consumption of human resources and improves the efficiency of news and public opinion processing. Improve the efficiency of public opinion processing of news events.
  • a storage medium stores at least one executable instruction, and the computer-executable instruction can execute the public opinion processing method for a news event in any of the foregoing method embodiments.
  • FIG. 3 shows a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the specific embodiment of the present application does not limit the specific implementation of the computer device.
  • the computer device may include: a processor (processor) 302 , a communication interface (Communications Interface) 204 , a memory (memory) 306 , and a communication bus 308 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • communication bus 308 a communication bus
  • the processor 302 , the communication interface 304 , and the memory 306 communicate with each other through the communication bus 308 .
  • the communication interface 304 is used for communicating with network elements of other devices such as clients or other servers.
  • the processor 302 is configured to execute the program 310, and specifically may execute the relevant steps in the above-mentioned embodiments of the public opinion processing method for news events.
  • the program 310 may include program code including computer operation instructions.
  • the processor 302 may be a central processing unit (CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computer equipment may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 306 is used to store the program 310 .
  • Memory 306 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • the memory can be non-volatile or volatile.
  • the program 310 can specifically be used to cause the processor 302 to perform the following operations:
  • the news public opinion information is information including the text content of each news event
  • the first classification mark is automatically marked and determined from the training sample sets of different public opinion requirements during the training of the first text classification model;
  • modules or steps of the present application can be implemented by a general-purpose computing device, and they can be centralized on a single computing device, or distributed in a network composed of multiple computing devices Alternatively, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device and executed by the computing device, and in some cases, in a different order than here
  • the steps shown or described are performed either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps of them into a single integrated circuit module.
  • the present application is not limited to any particular combination of hardware and software.

Abstract

一种新闻事件的舆情处理方法及装置、存储介质、计算机设备,涉及人工智能领域,主要目的在于解决现有新闻事件的舆情处理效率低的问题。包括:获取已采集的新闻舆情信息(101);根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理(102);提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理(103);从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出(104)。

Description

新闻事件的舆情处理方法及装置、存储介质、计算机设备
本申请要求与2020年12月22日提交中国专利局、申请号为CN202011526767.X申请名称为“新闻事件的舆情处理方法及装置、存储介质、计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能技术领域,特别是涉及一种新闻事件的舆情处理方法及装置、存储介质、计算机设备。
背景技术
目前,现有舆情系统获取数据的方法普遍是利用网络新闻中的内容直接收集舆情信息,对爬取的舆情信息进行人工分析处理,从而挖掘出针对新闻内容有用的信息,然而,这些从散落在各互联网网站的新闻内容,过于零散,不同类型的企业无法满足自身需求快速的找到需要的信息,需要专业分类性强的人员进行长时间、高强度的数据整理、分析处理才能有效得到企业需要的内容,使得长时间对收集的包含有新闻类的舆情信息进行处理又会影响信息的时效性,影响舆情信息对企业的影响作用,又消耗了大量人力资源,从而降低了企业对包含有新闻内容的舆情信息的处理效率。
发明内容
有鉴于此,本申请提供一种新闻事件的舆情处理方法及装置、存储介质、计算机设备,主要目的在于解决现有新闻事件的舆情处理效率低的问题。
依据本申请一个方面,提供了一种新闻事件的舆情处理方法,包括:
获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
依据本申请另一个方面,提供了一种新闻事件的舆情处理装置,包括:
获取模块,用于获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件 的文字内容的信息;
第一处理模块,用于根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
第二处理模块,用于提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
输出模块,用于从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
根据本申请的又一方面,提供了一种存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如上述新闻事件的舆情处理方法对应的操作。
根据本申请的再一方面,提供了一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行上述新闻事件的舆情处理方法对应的操作。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请降低了人力资源消耗,并提高了新闻舆情的处理高效性,从而提高了新闻事件的舆情处理效率。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种新闻事件的舆情处理方法流程图;
图2示出了本申请实施例提供的一种新闻事件的舆情处理装置组成框图;
图3示出了本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例提供了一种新闻事件的舆情处理方法,如图1所示,该方法包括:
101、获取已采集的新闻舆情信息。
其中,所述新闻舆情信息为包含各新闻事件的文字内容的信息,可以包括新闻事件的整篇文本、也可以包括新闻事件的部分文本,通过舆情系统实施进行采集,例如,通过舆情系统从不同新闻互联网网站中采集各完整篇幅文字内容的新闻事件,如企业a的上半年技术发展的整篇报道文字内容,本申请实施例不做具体限定。
需要说明的是,步骤101中获取已采集的新闻舆情信息为基于舆情系统进行采集的新闻舆情信息进行存储后获取的,可以为实时的,也可以为按照预设时间间隔的,以便进行对新闻舆情信息的处理。
102、根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理。
其中,所述第一文本分类模型可以为具有分类功能的任意一个机器学习模型,例如,神经网络模型、支持向量机模型,本申请实施例不做具体限定。具体的,第一文本分类模型为基于对新闻舆情信息中各文本特征进行分类的,因此,在进行第一次分类处理之前,需要对新闻舆情信息进行自然语言处理,转换为词向量后,基于已完成训练的第一文本分类模型进行分类,本申请实施例不做具体限定。
需要说明的是,在第一次分类处理后,得到的分类会同时进行分类标记,例如,根据新闻事件中各新闻类词语进行的分类可以包括商机类标记、风险类标记、竞争类标记等,当然的,进行标记的分类标记为在对第一文本分类模型进行训练时,对训练样本集中已完成标记的,从而对获取到的新闻舆情信息进行第一次分类处理时,得到上述分类标记,从而限定本申请实施例中对于第一次分类处理为按照一个大范围程度的分类过程。
103、提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理。
本申请实施例中,当完成第一次分类处理后,可以得到第一分类标记,如为步骤102中的商机类标记、风险类标记、竞争类标记等。本申请实施例中,对于不同的分类标记预先匹配第二文本分类模型进行训练,从而在一个大范围程序的分类下,再次进行小范围的详细分类。具体的,第二本文分类模型可以为任意一个机器学习模型,与第一文本分类模型可以相同,也可以不同,如可以为神经网络模型、支持向量机模型等,本申请实施例不做具体限定。另外,在第二文本分类模型完成训练时,基于不同的第一分类标记所对应的训练样本集进行训练,从而可以基于第一次分类标记匹配到对应的第二文本分类模型,并基于匹配到的第二文本分类模型对标注为第一分类标记的新闻舆情信息进行第二次分类处理。
需要说明的是,为了进一步提高第一文本分类模型的标记能力,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确 定的,即针对商机类标记、风险类标记、竞争类标记是基于不同舆情需求从训练样本集中进行自动标记的,即通过聚类算法基于舆情需求确定聚类特征后,对训练样本集中的新闻事件样本进行标记,本申请实施例不做具体限定。
104、从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
本申请实施例中,由于第二分类标记的新闻舆情信息为基于第一分类标记再次进行分类得到的,即一个第一分类标记可以分类出多个第二分类标记,例如,作为第一分类标记的商机类标记所对应的第二分类标记包括与企业相关的新技术类、企业并购新闻类、新市场新销路发现类等,本申请实施例不做具体限定。因此,最终得到新闻舆情信息可以具有2个分类标记,并提取新闻事件内容,映射至事理图谱中的节点处进行输出。
需要说明的是,事理图谱为一个事理逻辑知识库,描述了事件之间的演化规律和模式。结构上,事理图谱是一个有向有环图,其中节点代表事件,有向边代表事件之间的顺承、因果、条件和上下位等事理逻辑关系。因此,为了准确将不同分类标识对应的新闻事件的事件内容准确、并高效的呈现给用户,以实现新闻预警的目的,将事件内容以具有逻辑关系的方式映射到事理图谱对应的节点处。
对于本申请实施例,为了提高对新闻舆情信息的分类准确性,进一步限定对第一文本分类模型的训练方法,所述方法还包括:构建三层卷积神经网络模型,基于预设的三个核特征值从训练样本集中的各完成第一分类标记的新闻文本内容中提取特征信息;基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述三层卷积神经网络模型进行训练;利用adam优化器对训练过程中的所述三层卷积神经网络模型进行优化,直至完成所述三层卷积神经网络模型训练,得到第一文本分类模型。
本申请实施例中,选取卷积神经网络模型作为待训练的第一文本分类模型,构建一个三层卷积神经网络模型,利用核特征提取从训练样本新闻文本中提取特征,核的大小分别优选设定为2、3、4,其中,模型包括三个维度的输入层、卷积层、池化层pooling、全连接层dense、输出层;将三种卷积核提取的特征,通过maxpooling进行特征筛选,并将特征向量进行拼接,其中,maxPooling是对其中一个Filter卷积层抽取到若干特征值,只取得其中最大的那个池化Pooling层作为保留值,其他特征值全部抛弃,值的最大代表只保留这些特征中最强的,抛弃其他弱的此类特征;然后,经过一层dense层,激活函数为softmax进行三分类;最后,利用adam优化器对卷积神经网络模型进行模型优化,设定优化学习率为0.0005,包括:初始化梯度的累积量和平方累积量V d/omega=0;S d/omega=0;V db=0;S db=0;在第t次迭代中,用mini-batch梯度下降法计算出d/omga和db;.计算动量Momentum指数加权平均数;用RMSprop算法进行更新;计算Momentum和RMSprop的偏差修正;更新权重,直至完成模型迭代训练。
对于本申请实施例,为了满足不同企业获取舆情信息的需求,并实现自动标记的可实现性,所述构建三层卷积神经网络模型之前,所述方法还包括:获取待标记的新闻文本内容,并确定舆情需求;根据所述舆情需求确定K-means聚类中的k值,对所述新闻文本内容进行聚类,并提取聚类后不同聚类簇中文本词的出现次数超过预设阈值的特征词语,作为第一分类标记内容;基于所述第一分类标记内容对完成聚类的不同聚类簇进行第一标记分类。
其中,待标记的新闻文本内容为存储于训练样本集中作为待训练的文本内容,舆情需求为用户直接录入的分类个数,例如竞争、xx企业等2个分类个数,确定k值,从而结合舆情需求来自动的实现标记。在完成聚类后,得到不同的聚类簇,即可以确定为分类的每一类的文本内容,需要进一步确定文本内容所对应的标识,因此,基于文本词的出现次数超过阈值预设的特征词语,则确定为标记内容,例如,2个分类簇中,词语“风险”超过特征词语“风险保险”“保险”“风险投资”等预设阈值7,则将风险确定为此类的分类标记,本申请实施例不做具体限定。
需要说明的是,利用k-means聚类的具体方法包括:根据舆情需求确定k值;从数据集中随机选择k个数据点作为质心;对数据集中每一个点,计算其与每一个质心的距离(如欧式距离),离哪个质心近,就划分到那个质心所属的集合;把所有数据归好集合后,一共有k个集合。然后重新计算每个集合的质心;如果新计算出来的质心和原来的质心之间的距离小于某一个设置的阈值,我们可以认为聚类已经达到期望的结果,算法终止;如果新质心和原质心距离变化很大,需要重新迭代计算质心距离值的步骤,直至符合要求。
对于本申请实施例,为了提高对新闻舆情信息的分类准确性,进一步限定对第一文本分类模型的训练方法,所述方法还包括:构建二层卷积神经网络模型,基于预设的两个核特征值从训练样本集中隶属于所述第一分类标记的第二分类标记的新闻文本内容中提取特征信息,其中,不同的第一分类标记匹配至少一个不同的第二分类标记;基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述二层卷积神经网络模型进行训练;利用adam优化器对训练过程中的所述二层卷积神经网络模型进行优化,直至完成所述二层卷积神经网络模型训练,得到第二文本分类模型。
本申请实施例中,选取卷积神经网络模型作为待训练的第二文本分类模型,构建一个二层卷积神经网络模型,利用核特征提取从训练样本新闻文本中提取特征,核的大小分别优选设定为2、3,模型包括二个维度的输入层、卷积层、池化层pooling、全连接层dense、输出层;将两种卷积核提取的特征,通过maxpooling进行特征筛选,并将特征向量进行拼接;然后经过一层dense层,每个神经单元的dropout rate为0.2,激活函数为softmax进行三分类;最后,利用adam优化器对CNN模型进行模型优化,设定优化学习率为0.0001,与第一文本分类模型训练过程中优化方法相同,此处不在赘 述。
对于本申请实施例,为了提高对新闻舆情信息的获取准确性,从而提高舆情处理效率,所述获取已采集的新闻舆情信息之前,所述方法还包括:接收录入的舆情关键词,所述舆情关键词与所述舆情需求关联;从预设时间间隔内采集的舆情信息库中,按照所述舆情关键词查找匹配的新闻舆情信息;当查找到的新闻舆情信息中匹配所述舆情关键词的个数超过预设阈值,则将所述新闻舆情信息确定为已采集到的新闻舆情信息。
具体的,用于可以直接输入舆情关键词,且舆情关键词与舆情需求关联,例如,输入舆情需求为3个,关键词可以为3个以上,从而提高舆情处理的准确性。舆情系统将采集到的各新闻舆情信息存储在舆情信息库中,当需要进行舆情处理时,按照预设时间间隔采集舆情信息库中的新闻舆情信息,并结合舆情关键词查找匹配的新闻舆情信息,例如,基于关键词风险查找包含有词语风险的全部新闻舆情信息。进一步判断查找到新闻舆情信息中匹配舆情关键词的个数超过预设阈值,则确定为已采集到的新闻舆情信息,例如匹配的新闻舆情信息1中包含风险词语的5个,超过预设阈值3,则将新闻舆情信息1作为已采集到的新闻舆情信息,本申请实施例不做具体限定。
对于本申请实施例,为了进一步限定事理图谱的映射方法,从而提高舆情处理效率,所述结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处包括:基于所述舆情关键词定义事理图谱的事件,并从所述新闻事件内容中提取事件内容;按照时间顺序、因果关系、上下位关系抽取所述事件内容中的事理关系;根据所述时间顺序、所述因果关系、所述上下位关系建立所述事理关系的各节点以及节点关系,构建事理图谱;将标有所述第一分类标记、所述第二分类标记的所述新闻事件内容写入所述事理图谱中对应于所述事理内容的节点处。
具体的,基于舆情关键词定义事理图谱的事件,即为确定需要构建事理图谱的第一层事件的词语,例如,舆情关键词为竞争,则确定事理图谱的事件为竞争,即构建的事理图谱是围绕着竞争的相关新闻舆情信息所对应的事件内容。可以通过自然语言处理技术从新闻事件内容中提取事件内容是指通过包含有整个新闻事件核心的内容,例如,新闻事件内容为某地于三个商业巨头针对苹果销售的恶意竞争事件,则在提取过程中,按照主语、动词、谓语的形式,提取事件内容为苹果销售恶意竞争,本申请实施例不做具体限定。另外,由于不同的新闻舆情信息的采集会基于时间、是否在一个网站、是否在一个专栏等情况,因此,分布针对时间顺序(采集时间)、因果关系(是否在一个网站采集)、上下位关系(是否在具有上下位关系的专栏中采集)确定事理关系,即为在事理图谱中各节点之间的关系,每个节点用于存储一个新闻舆情信息。在构建事理图谱时,按照时间顺序作为每一层网络节点,依次按照因果关系、上下位关系连接存储各新闻舆情信息的事件内容的节点,本申请实施例不做具体限定。
对于本申请实施例,为了便于及时对舆情处理结果的获取,提高对新闻舆情的预 警效果,所述方法还包括:当接收到对所述事理图谱中任意节点的查询请求时,按照预设查询层级统计与所述节点具有节点关系的各节点的事件内容,进行输出。
其中,用户可以基于鼠标触发对事理图片中任意节点的查询请求,当前端接收到查询请求时,按照预设查询层级统计与查询的节点具有节点关系的各节点的事件内容,进行输出。所述预设查询层级为可以基于节点向上一节点进行查询的层级以及向下一节点进行查询的层级,优选为向上2个节点、向下2个节点,然后统计出各节点中事件内容进行输出。另外,与节点具有节点关系即为可以通过上下层级查找的节点,本申请实施例不做具体限定。
本申请实施例提供了一种新闻事件的舆情处理方法,本申请实施例通过获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出,满足不同企业用户进行精准新闻舆情处理的需求,大大降低了人力资源消耗,并提高了新闻舆情的处理高效性,从而提高了新闻事件的舆情处理效率。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种新闻事件的舆情处理装置,如图2所示,该装置包括:
获取模块21,用于获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
第一处理模块22,用于根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
第二处理模块23,用于提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
输出模块24,用于从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
进一步地,所述装置还包括:
构建模块,用于构建三层卷积神经网络模型,基于预设的三个核特征值从训练样本集中的各完成第一分类标记的新闻文本内容中提取特征信息;
训练模块,用于基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述三层卷积神经网络模型进行训练;
优化模块,用于利用adam优化器对训练过程中的所述三层卷积神经网络模型进行优化,直至完成所述三层卷积神经网络模型训练,得到第一文本分类模型。
进一步地,所述装置还包括:
第一确定模块,用于获取待标记的新闻文本内容,并确定舆情需求;
聚类模块,用于根据所述舆情需求确定K-means聚类中的k值,对所述新闻文本内容进行聚类,并提取聚类后不同聚类簇中文本词的出现次数超过预设阈值的特征词语,作为第一分类标记内容,;
标记模块,用于基于所述第一分类标记内容对完成聚类的不同聚类簇进行第一标记分类。
进一步地,所述构建模块,还用于构建二层卷积神经网络模型,基于预设的两个核特征值从训练样本集中隶属于所述第一分类标记的第二分类标记的新闻文本内容中提取特征信息,其中,不同的第一分类标记匹配至少一个不同的第二分类标记;
所述训练模块,还用于基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述二层卷积神经网络模型进行训练;
所述优化模块,还用于利用adam优化器对训练过程中的所述二层卷积神经网络模型进行优化,直至完成所述二层卷积神经网络模型训练,得到第二文本分类模型。
进一步地,所述装置还包括:
接收模块,用于接收录入的舆情关键词,所述舆情关键词与所述舆情需求关联;
查找模块,用于从预设时间间隔内采集的舆情信息库中,按照所述舆情关键词查找匹配的新闻舆情信息;
第二确定模块,用于当查找到的新闻舆情信息中匹配所述舆情关键词的个数超过预设阈值,则将所述新闻舆情信息确定为已采集到的新闻舆情信息。
进一步地,所述输出模块包括:
提取单元,用于基于所述舆情关键词定义事理图谱的事件,并从所述新闻事件内容中提取事件内容;
计算单元,用于按照时间顺序、因果关系、上下位关系抽取所述事件内容中的事理关系;
构建单元,用于根据所述时间顺序、所述因果关系、所述上下位关系建立所述事理关系的各节点以及节点关系,构建事理图谱;
写入单元,用于将标有所述第一分类标记、所述第二分类标记的所述新闻事件内容写入所述事理图谱中对应于所述事理内容的节点处。
进一步地,所述装置还包括:
统计模块,用于当接收到对所述事理图谱中任意节点的查询请求时,按照预设查 询层级统计与所述节点具有节点关系的各节点的事件内容,进行输出。
本申请实施例提供了一种新闻事件的舆情处理装置,本申请实施例通过获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出,满足不同企业用户进行精准新闻舆情处理的需求,大大降低了人力资源消耗,并提高了新闻舆情的处理高效性,从而提高了新闻事件的舆情处理效率。
根据本申请一个实施例提供了一种存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的新闻事件的舆情处理方法。
图3示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图3所示,该计算机设备可以包括:处理器(processor)302、通信接口(Communications Interface)204、存储器(memory)306、以及通信总线308。
其中:处理器302、通信接口304、以及存储器306通过通信总线308完成相互间的通信。
通信接口304,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器302,用于执行程序310,具体可以执行上述新闻事件的舆情处理方法实施例中的相关步骤。
具体地,程序310可以包括程序代码,该程序代码包括计算机操作指令。
处理器302可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器306,用于存放程序310。存储器306可能包含高速RAM存储器,也可能还包括存储器(non-volatile memory),例如至少一个磁盘存储器。所述存储器可以是非易失性的,也可以是易失性的。
程序310具体可以用于使得处理器302执行以下操作:
获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (20)

  1. 一种新闻事件的舆情处理方法,其中,包括:
    获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
    根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
    提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
    从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    构建三层卷积神经网络模型,基于预设的三个核特征值从训练样本集中的各完成第一分类标记的新闻文本内容中提取特征信息;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述三层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述三层卷积神经网络模型进行优化,直至完成所述三层卷积神经网络模型训练,得到第一文本分类模型。
  3. 根据权利要求2所述的方法,其中,所述构建三层卷积神经网络模型之前,所述方法还包括:
    获取待标记的新闻文本内容,并确定舆情需求;
    根据所述舆情需求确定K-means聚类中的k值,对所述新闻文本内容进行聚类,并提取聚类后不同聚类簇中文本词的出现次数超过预设阈值的特征词语,作为第一分类标记内容;
    基于所述第一分类标记内容对完成聚类的不同聚类簇进行第一标记分类。
  4. 根据权利要求1所述的方法,其中,所述方法还包括:
    构建二层卷积神经网络模型,基于预设的两个核特征值从训练样本集中隶属于所述第一分类标记的第二分类标记的新闻文本内容中提取特征信息,其中,不同的第一分类标记匹配至少一个不同的第二分类标记;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述二层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述二层卷积神经网络模型进行优化,直至完成所述二层卷积神经网络模型训练,得到第二文本分类模型。
  5. 根据权利要求1所述的方法,其中,所述获取已采集的新闻舆情信息之前,所 述方法还包括:
    接收录入的舆情关键词,所述舆情关键词与所述舆情需求关联;
    从预设时间间隔内采集的舆情信息库中,按照所述舆情关键词查找匹配的新闻舆情信息;
    当查找到的新闻舆情信息中匹配所述舆情关键词的个数超过预设阈值,则将所述新闻舆情信息确定为已采集到的新闻舆情信息。
  6. 根据权利要求1-5任一项所述的方法,其中,所述结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处包括;
    基于所述舆情关键词定义事理图谱的事件,并从所述新闻事件内容中提取事件内容;
    按照时间顺序、因果关系、上下位关系抽取所述事件内容中的事理关系;
    根据所述时间顺序、所述因果关系、所述上下位关系建立所述事理关系的各节点以及节点关系,构建事理图谱;
    将标有所述第一分类标记、所述第二分类标记的所述新闻事件内容写入所述事理图谱中对应于所述事理内容的节点处。
  7. 根据权利要求6所述的方法,其中,所述方法还包括:
    当接收到对所述事理图谱中任意节点的查询请求时,按照预设查询层级统计与所述节点具有节点关系的各节点的事件内容,进行输出。
  8. 一种新闻事件的舆情处理装置,其中,包括:
    获取模块,用于获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
    第一处理模块,用于根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
    第二处理模块,用于提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
    输出模块,用于从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
  9. 一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现新闻事件的舆情处理方法,包括:
    获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
    根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
    提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
    从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
  10. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现方法还包括:
    构建三层卷积神经网络模型,基于预设的三个核特征值从训练样本集中的各完成第一分类标记的新闻文本内容中提取特征信息;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述三层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述三层卷积神经网络模型进行优化,直至完成所述三层卷积神经网络模型训练,得到第一文本分类模型。
  11. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现构建三层卷积神经网络模型之前,所述方法还包括:
    获取待标记的新闻文本内容,并确定舆情需求;
    根据所述舆情需求确定K-means聚类中的k值,对所述新闻文本内容进行聚类,并提取聚类后不同聚类簇中文本词的出现次数超过预设阈值的特征词语,作为第一分类标记内容;
    基于所述第一分类标记内容对完成聚类的不同聚类簇进行第一标记分类。
  12. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现方法还包括:
    构建二层卷积神经网络模型,基于预设的两个核特征值从训练样本集中隶属于所述第一分类标记的第二分类标记的新闻文本内容中提取特征信息,其中,不同的第一分类标记匹配至少一个不同的第二分类标记;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述二层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述二层卷积神经网络模型进行优化,直至完成所述二层卷积神经网络模型训练,得到第二文本分类模型。
  13. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现获取已采集的新闻舆情信息之前,所述方法还包括:
    接收录入的舆情关键词,所述舆情关键词与所述舆情需求关联;
    从预设时间间隔内采集的舆情信息库中,按照所述舆情关键词查找匹配的新闻舆情信息;
    当查找到的新闻舆情信息中匹配所述舆情关键词的个数超过预设阈值,则将所述新闻舆情信息确定为已采集到的新闻舆情信息。
  14. 根据权利要求9-13任一项所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处包括;
    基于所述舆情关键词定义事理图谱的事件,并从所述新闻事件内容中提取事件内容;
    按照时间顺序、因果关系、上下位关系抽取所述事件内容中的事理关系;
    根据所述时间顺序、所述因果关系、所述上下位关系建立所述事理关系的各节点以及节点关系,构建事理图谱;
    将标有所述第一分类标记、所述第二分类标记的所述新闻事件内容写入所述事理图谱中对应于所述事理内容的节点处。
  15. 一种计算机设备,包括存储器、处理器及存储在存储器上并可处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现新闻事件的舆情处理方法,包括:
    获取已采集的新闻舆情信息,所述新闻舆情信息为包含各新闻事件的文字内容的信息;
    根据已完成训练的第一文本分类模型对所述新闻舆情信息进行第一次分类处理;
    提取与所述第一次分类处理得到的第一分类标记匹配的且已完成训练的第二文本分类模型,并根据所述第二文本分类模型对匹配所述第一分类标记的新闻舆情信息进行第二次分类处理,所述第一分类标记为对所述第一次文本分类模型训练过程中从不同舆情需求训练样本集中自动标记确定的;
    从所述第二次分类处理确定第二分类标记的所述新闻舆情信息中提取新闻事件内容,并结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处,进行输出。
  16. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现方法还包括:
    构建三层卷积神经网络模型,基于预设的三个核特征值从训练样本集中的各完成第一分类标记的新闻文本内容中提取特征信息;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述三层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述三层卷积神经网络模型进行优化,直至完成所述三层卷积神经网络模型训练,得到第一文本分类模型。
  17. 根据权利要求16所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现构建三层卷积神经网络模型之前,所述方法还包括:
    获取待标记的新闻文本内容,并确定舆情需求;
    根据所述舆情需求确定K-means聚类中的k值,对所述新闻文本内容进行聚类,并提取聚类后不同聚类簇中文本词的出现次数超过预设阈值的特征词语,作为第一分类标记内容;
    基于所述第一分类标记内容对完成聚类的不同聚类簇进行第一标记分类。
  18. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现方法还包括:
    构建二层卷积神经网络模型,基于预设的两个核特征值从训练样本集中隶属于所述第一分类标记的第二分类标记的新闻文本内容中提取特征信息,其中,不同的第一分类标记匹配至少一个不同的第二分类标记;
    基于池化层对所述特征信息进行特征筛选,拼接筛选后的特征向量,并利用所述训练样本集对所述二层卷积神经网络模型进行训练;
    利用adam优化器对训练过程中的所述二层卷积神经网络模型进行优化,直至完成所述二层卷积神经网络模型训练,得到第二文本分类模型。
  19. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现获取已采集的新闻舆情信息之前,所述方法还包括:
    接收录入的舆情关键词,所述舆情关键词与所述舆情需求关联;
    从预设时间间隔内采集的舆情信息库中,按照所述舆情关键词查找匹配的新闻舆情信息;
    当查找到的新闻舆情信息中匹配所述舆情关键词的个数超过预设阈值,则将所述新闻舆情信息确定为已采集到的新闻舆情信息。
  20. 根据权利要求15-19任一项所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现结合所述第一分类标记、所述第二分类标记将所述新闻事件内容映射至与所述舆情需求匹配的事理图谱中的对应节点处包括;
    基于所述舆情关键词定义事理图谱的事件,并从所述新闻事件内容中提取事件内容;
    按照时间顺序、因果关系、上下位关系抽取所述事件内容中的事理关系;
    根据所述时间顺序、所述因果关系、所述上下位关系建立所述事理关系的各节点以及节点关系,构建事理图谱;
    将标有所述第一分类标记、所述第二分类标记的所述新闻事件内容写入所述事理图谱中对应于所述事理内容的节点处。
PCT/CN2021/124890 2020-12-22 2021-10-20 新闻事件的舆情处理方法及装置、存储介质、计算机设备 WO2022134794A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011526767.X 2020-12-22
CN202011526767.XA CN112650923A (zh) 2020-12-22 2020-12-22 新闻事件的舆情处理方法及装置、存储介质、计算机设备

Publications (1)

Publication Number Publication Date
WO2022134794A1 true WO2022134794A1 (zh) 2022-06-30

Family

ID=75358945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124890 WO2022134794A1 (zh) 2020-12-22 2021-10-20 新闻事件的舆情处理方法及装置、存储介质、计算机设备

Country Status (2)

Country Link
CN (1) CN112650923A (zh)
WO (1) WO2022134794A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090744A (zh) * 2019-12-17 2020-05-01 中科鼎富(北京)科技发展有限公司 股市运行风险信息挖掘方法及装置
CN114880491A (zh) * 2022-07-08 2022-08-09 云孚科技(北京)有限公司 一种事理图谱自动构建方法和系统
CN114969382A (zh) * 2022-07-19 2022-08-30 国网浙江省电力有限公司信息通信分公司 基于事理图谱事件链推理的实体生成方法
CN115358896A (zh) * 2022-10-20 2022-11-18 四川大学华西医院 以海量文书构建罪名演化网络的方法、装置、设备及介质
CN115827989A (zh) * 2023-02-16 2023-03-21 杭州金诚信息安全科技有限公司 大数据环境下网络舆情人工智能预警系统及方法
CN116522013A (zh) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 基于社交网络平台的舆情分析方法及系统
CN117312634A (zh) * 2023-11-29 2023-12-29 大文传媒集团(山东)有限公司 人工智能数据整合传播处理系统
CN117649117A (zh) * 2024-01-30 2024-03-05 浙江数洋科技有限公司 处置方案的确定方法、装置以及计算机设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650923A (zh) * 2020-12-22 2021-04-13 深圳壹账通智能科技有限公司 新闻事件的舆情处理方法及装置、存储介质、计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489550A (zh) * 2019-07-16 2019-11-22 招联消费金融有限公司 基于组合神经网络的文本分类方法、装置和计算机设备
US20200050618A1 (en) * 2018-08-09 2020-02-13 Walmart Apollo, Llc System and method for electronic text classification
CN111161726A (zh) * 2019-12-24 2020-05-15 广州索答信息科技有限公司 一种智能语音交互方法、设备、介质及系统
CN112650923A (zh) * 2020-12-22 2021-04-13 深圳壹账通智能科技有限公司 新闻事件的舆情处理方法及装置、存储介质、计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200050618A1 (en) * 2018-08-09 2020-02-13 Walmart Apollo, Llc System and method for electronic text classification
CN110489550A (zh) * 2019-07-16 2019-11-22 招联消费金融有限公司 基于组合神经网络的文本分类方法、装置和计算机设备
CN111161726A (zh) * 2019-12-24 2020-05-15 广州索答信息科技有限公司 一种智能语音交互方法、设备、介质及系统
CN112650923A (zh) * 2020-12-22 2021-04-13 深圳壹账通智能科技有限公司 新闻事件的舆情处理方法及装置、存储介质、计算机设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAN XIAOHONG, ET AL.: "Research on Internet Public Opinion Event Prediction Method Based on Event Evolution Graph", INFORMATION STUDIES: THEORY & APPLICATION, vol. 43, no. 10, 31 October 2020 (2020-10-31), XP055945531, ISSN: 1000-7490, DOI: 10.16353/j.cnki.1000-7490.2020.10.027 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090744A (zh) * 2019-12-17 2020-05-01 中科鼎富(北京)科技发展有限公司 股市运行风险信息挖掘方法及装置
CN114880491A (zh) * 2022-07-08 2022-08-09 云孚科技(北京)有限公司 一种事理图谱自动构建方法和系统
CN114969382A (zh) * 2022-07-19 2022-08-30 国网浙江省电力有限公司信息通信分公司 基于事理图谱事件链推理的实体生成方法
CN114969382B (zh) * 2022-07-19 2022-10-21 国网浙江省电力有限公司信息通信分公司 基于事理图谱事件链推理的实体生成方法
CN115358896A (zh) * 2022-10-20 2022-11-18 四川大学华西医院 以海量文书构建罪名演化网络的方法、装置、设备及介质
CN115358896B (zh) * 2022-10-20 2023-02-03 四川大学华西医院 以海量文书构建罪名演化网络的方法、装置、设备及介质
CN115827989A (zh) * 2023-02-16 2023-03-21 杭州金诚信息安全科技有限公司 大数据环境下网络舆情人工智能预警系统及方法
CN116522013A (zh) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 基于社交网络平台的舆情分析方法及系统
CN116522013B (zh) * 2023-06-29 2023-09-05 乐麦信息技术(杭州)有限公司 基于社交网络平台的舆情分析方法及系统
CN117312634A (zh) * 2023-11-29 2023-12-29 大文传媒集团(山东)有限公司 人工智能数据整合传播处理系统
CN117312634B (zh) * 2023-11-29 2024-02-20 大文传媒集团(山东)有限公司 人工智能数据整合传播处理系统
CN117649117A (zh) * 2024-01-30 2024-03-05 浙江数洋科技有限公司 处置方案的确定方法、装置以及计算机设备

Also Published As

Publication number Publication date
CN112650923A (zh) 2021-04-13

Similar Documents

Publication Publication Date Title
WO2022134794A1 (zh) 新闻事件的舆情处理方法及装置、存储介质、计算机设备
US11238310B2 (en) Training data acquisition method and device, server and storage medium
Nguyen et al. Automatic image filtering on social networks using deep learning and perceptual hashing during crises
WO2017097231A1 (zh) 话题处理方法及装置
CN107844533A (zh) 一种智能问答系统及分析方法
WO2021175009A1 (zh) 预警事件图谱的构建方法、装置、设备及存储介质
CN110059177B (zh) 一种基于用户画像的活动推荐方法及装置
WO2020114108A1 (zh) 聚类结果的解释方法和装置
WO2021051864A1 (zh) 词典扩充方法及装置、电子设备、存储介质
CN112163424A (zh) 数据的标注方法、装置、设备和介质
CN114238573B (zh) 基于文本对抗样例的信息推送方法及装置
CN111767725A (zh) 一种基于情感极性分析模型的数据处理方法及装置
CN115796181A (zh) 一种针对化工领域的文本关系抽取方法
CN108241867B (zh) 一种分类方法及装置
WO2022247955A1 (zh) 非正常账号识别方法、装置、设备和存储介质
TWI828928B (zh) 高擴展性、多標籤的文本分類方法和裝置
CN111680506A (zh) 数据库表的外键映射方法、装置、电子设备和存储介质
CN107392311A (zh) 序列切分的方法和装置
CN112116331A (zh) 一种人才推荐方法及装置
CN114416998A (zh) 文本标签的识别方法、装置、电子设备及存储介质
WO2022188646A1 (zh) 图数据处理方法、装置、设备、存储介质及程序产品
CN106980639B (zh) 短文本数据聚合系统及方法
CN115146062A (zh) 融合专家推荐与文本聚类的智能事件分析方法和系统
Lo et al. An emperical study on application of big data analytics to automate service desk business process
CN109992723B (zh) 一种基于社交网络的用户兴趣标签构建方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908795

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.10.2023)