CN109359296B - Public opinion emotion recognition method and device and computer readable storage medium - Google Patents

Public opinion emotion recognition method and device and computer readable storage medium Download PDF

Info

Publication number
CN109359296B
CN109359296B CN201811096799.3A CN201811096799A CN109359296B CN 109359296 B CN109359296 B CN 109359296B CN 201811096799 A CN201811096799 A CN 201811096799A CN 109359296 B CN109359296 B CN 109359296B
Authority
CN
China
Prior art keywords
public opinion
emotion
emotion recognition
topic
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811096799.3A
Other languages
Chinese (zh)
Other versions
CN109359296A (en
Inventor
郑少杰
蔡远航
付勇
林文聪
范增虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201811096799.3A priority Critical patent/CN109359296B/en
Publication of CN109359296A publication Critical patent/CN109359296A/en
Application granted granted Critical
Publication of CN109359296B publication Critical patent/CN109359296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a public opinion emotion recognition method, which comprises the following steps: performing topic clustering on a plurality of public opinion documents in the same field by adopting a topic model algorithm to obtain a plurality of topic clusters, wherein each topic cluster comprises one or a plurality of documents; the method comprises the steps of carrying out positive and negative emotion marking on corresponding topic clusters to obtain a document with positive and negative emotion labels; taking the document with the positive and negative emotion labels as a training sample, and training an emotion recognition model; and carrying out emotion recognition on the target public opinion document to be recognized based on the emotion recognition model. The invention also discloses a public opinion emotion recognition device and a computer readable storage medium. The method and the device improve the labeling efficiency of emotion corpus and reduce the operation difficulty of public opinion emotion recognition.

Description

Public opinion emotion recognition method and device and computer readable storage medium
Technical Field
The present invention relates to the field of emotion recognition technologies, and in particular, to a method and apparatus for identifying public opinion emotion, and a computer readable storage medium.
Background
In traditional emotion analysis, a large number of positive and negative emotion corpora are marked in a total manual mode, training is carried out through the emotion corpora to extract emotion words in the positive and negative emotion corpora, emotion recognition is carried out on a section of text based on the distribution of the emotion words, and emotion bias of the document is confirmed, for example, the text belongs to positive emotion (positive) or belongs to negative emotion (negative).
Generally, the emotion of the text public opinion is defined differently for different industry fields, so that the same emotion corpus is difficult to perfectly migrate to all industry fields, a large amount of emotion corpora are required to be generated aiming at the corresponding industry fields, each emotion corpus is required to be artificially marked, a large amount of manpower is required, and corresponding professional background knowledge is also required, so that marking efficiency of the emotion corpora is affected, and operation difficulty of public opinion emotion recognition is increased.
Disclosure of Invention
The invention mainly aims to provide a public opinion emotion recognition method, a device and a computer readable storage medium, which aim to solve the technical problems of improving the labeling efficiency of emotion corpus and reducing the operation difficulty of public opinion emotion recognition.
In order to achieve the above object, the present invention provides a method for identifying public opinion emotion, the method for identifying public opinion emotion comprising:
performing topic clustering on a plurality of public opinion documents in the same field by adopting a topic model algorithm to obtain a plurality of topic clusters, wherein each topic cluster comprises one or a plurality of documents;
the method comprises the steps of carrying out positive and negative emotion marking on corresponding topic clusters to obtain a document with positive and negative emotion labels;
taking the document with the positive and negative emotion labels as a training sample, and training an emotion recognition model;
and carrying out emotion recognition on the target public opinion document to be recognized based on the emotion recognition model.
Optionally, the performing positive and negative emotion marking on the corresponding topic clusters includes:
and screening out topic clusters with emotion tendencies from all topic clusters based on a preset emotion dictionary, and marking positive and negative emotion.
Optionally, the performing positive and negative emotion marking on the corresponding topic clusters includes:
acquiring a theme cluster designated by a user and positive and negative emotions corresponding to the theme cluster;
and labeling positive and negative emotions of the theme clusters appointed by the user.
Optionally, after the step of labeling the positive and negative emotion on the corresponding topic cluster to obtain the document with the positive and negative emotion label, the method further comprises the steps of:
judging whether the number of the topic clusters for positive and negative emotion marking of the topic clusters of the current round meets the condition of forming a training sample or not;
if yes, stopping performing the next round of theme clustering;
if not, increasing the number of topic clusters output by topic clustering and continuing to adopt a topic model algorithm to perform the next topic clustering on the public opinion documents.
Optionally, based on the emotion recognition model, performing emotion recognition on the target public opinion document to be recognized includes:
extracting key sentences in a target public opinion document to be identified as text abstracts;
and carrying out emotion recognition on the text abstract based on the emotion recognition model.
Optionally, extracting key sentences in the target public opinion document to be identified as the text abstract includes:
sentence segmentation is carried out on the target public opinion document to obtain all sentences forming the target public opinion document;
calculating the similarity between the title of the target public opinion document and each sentence;
sorting each sentence based on the similarity;
selecting a specified number of sentences from all or the specified number of sentences after sorting through a maximum boundary correlation algorithm;
and taking the selected sentences as key sentences in the target public opinion document to form a text abstract.
Optionally, the emotion recognition of the text abstract based on the emotion recognition model includes:
word segmentation is carried out on the text abstract to obtain a plurality of words;
constructing word vectors corresponding to the words in the text abstract based on the words obtained by word segmentation;
and merging the word vectors into sentence vectors, and inputting the sentence vectors into the emotion recognition model to perform emotion recognition on the text abstract.
Further, in order to achieve the above object, the present invention provides a public opinion emotion recognition device, which includes a memory, a processor, and a public opinion emotion recognition program stored in the memory and executable on the processor, wherein the public opinion emotion recognition program when executed by the processor implements the steps of the public opinion emotion recognition method according to any one of the above.
Further, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a public opinion emotion recognition program which, when executed by a processor, implements the steps of the public opinion emotion recognition method according to any one of the above.
According to the method, topic model algorithms are adopted to cluster a plurality of public opinion documents in the same field, all documents in the public opinion documents are clustered under a plurality of topic clusters respectively, each topic cluster comprises one or a plurality of documents, so that positive and negative emotion marking is carried out on the topic clusters, namely the positive and negative emotion marking is carried out on all the documents under the topic clusters, in addition, the documents with positive and negative emotion labels are further used as training samples to train, emotion recognition models are obtained, emotion recognition is carried out, marking efficiency of emotion corpus is improved, and operation difficulty of the public opinion emotion recognition is further reduced.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment of a device according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of a method for identifying public opinion emotion according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the method for identifying public opinion emotion according to the present invention;
FIG. 4 is a schematic diagram of a refinement flow chart of step S40 in FIG. 2;
fig. 5 is a schematic diagram of a refinement flow chart of step S401 in fig. 4;
fig. 6 is a schematic diagram of a refinement flow of step S402 in fig. 4.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a public opinion emotion recognition device.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment of a device according to an embodiment of the present invention.
As shown in fig. 1, the public opinion emotion recognition device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above. It should be noted that, the processor 1001 is mounted in the public opinion emotion recognition device by using an embedded chip.
It will be appreciated by those skilled in the art that the hardware configuration of the public opinion emotion recognition device shown in fig. 1 is not limiting and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a public opinion emotion recognition program may be included in a memory 1005 as one type of computer-readable storage medium. The operating system is a program for managing and controlling the public opinion emotion recognition device and software resources and supports the operation of a network communication module, a user interface module, a public opinion emotion recognition program and other programs or software; the network communication module is used to manage and control the network interface 1004; the user interface module is used to manage and control the user interface 1003.
In the hardware structure of the public opinion emotion recognition device shown in fig. 1, the network interface 1004 is mainly used for connecting to a system background and performing data communication with the system background; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the public opinion emotion recognition apparatus calls the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and performs the following operations:
performing topic clustering on a plurality of public opinion documents in the same field by adopting a topic model algorithm to obtain a plurality of topic clusters, wherein each topic cluster comprises one or a plurality of documents;
the method comprises the steps of carrying out positive and negative emotion marking on corresponding topic clusters to obtain a document with positive and negative emotion labels;
taking the document with the positive and negative emotion labels as a training sample, and training an emotion recognition model;
and carrying out emotion recognition on the target public opinion document to be recognized based on the emotion recognition model.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
and screening out topic clusters with emotion tendencies from all topic clusters based on a preset emotion dictionary, and marking positive and negative emotion.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
acquiring a theme cluster designated by a user and positive and negative emotions corresponding to the theme cluster;
and labeling positive and negative emotions of the theme clusters appointed by the user.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
judging whether the number of the topic clusters for positive and negative emotion marking of the topic clusters of the current round meets the condition of forming a training sample or not;
if yes, stopping performing the next round of theme clustering;
if not, increasing the number of topic clusters output by topic clustering and continuing to adopt a topic model algorithm to perform the next topic clustering on the public opinion documents.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
extracting key sentences in a target public opinion document to be identified as text abstracts;
and carrying out emotion recognition on the text abstract based on the emotion recognition model.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
sentence segmentation is carried out on the target public opinion document to obtain all sentences forming the target public opinion document;
calculating the similarity between the title of the target public opinion document and each sentence;
sorting each sentence based on the similarity;
selecting a specified number of sentences from all or the specified number of sentences after sorting through a maximum boundary correlation algorithm;
and taking the selected sentences as key sentences in the target public opinion document to form a text abstract.
Further, the public opinion emotion recognition device invokes the public opinion emotion recognition program stored in the memory 1005 through the processor 1001, and also performs the following operations:
word segmentation is carried out on the text abstract to obtain a plurality of words;
constructing word vectors corresponding to the words in the text abstract based on the words obtained by word segmentation;
and merging the word vectors into sentence vectors, and inputting the sentence vectors into the emotion recognition model to perform emotion recognition on the text abstract.
Based on the hardware running environment of the device for identifying public opinion emotion in the above embodiment, the following embodiments of the method for identifying public opinion emotion are provided.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the public opinion emotion recognition method of the present invention. In this embodiment, the public opinion emotion recognition method includes:
step S10, performing topic clustering on a plurality of public opinion documents in the same field by adopting a topic model algorithm to obtain a plurality of topic clusters, wherein each topic cluster comprises one or a plurality of documents;
the topic model algorithm is a modeling method for implicit topics in a document, and each word of a document is obtained by a process of selecting a topic with a certain probability and selecting a word from the topic with a certain probability.
In this embodiment, an LDA (Latent Dirichlet allocation, implicit dirichlet allocation) topic model algorithm is preferably used to perform topic clustering on public opinion documents. One or more words are contained in a document, and it should be noted that, the words specifically refer to single words, which may be foreign words or chinese words.
The topic model algorithm pre-processes the input public opinion documents such as word segmentation and word stopping, so that the public opinion documents are segmented into a plurality of words, then the specified number of topic clusters are output after the algorithm operation, each topic cluster comprises one or more documents, and meanwhile, data such as topic probability distribution of each document, document probability distribution under each topic and the like are also output.
Step S20, positive and negative emotion marking is carried out on the corresponding theme clusters, and a document with positive and negative emotion labels is obtained;
emotions can generally be classified into positive emotions (such as words that represent positive happiness), negative emotions (such as words that represent negative pessimisty), and neutral emotions (such as words without emotional color, such as numbers, names, etc.). Among a plurality of topic clusters obtained through topic clustering, some topic clusters have positive emotion tendencies, some topic clusters have negative emotion tendencies, and other topic clusters have no emotion tendencies.
In the embodiment, emotion marking is preferably performed on the topic clusters with positive and negative emotion tendencies, and emotion marking of all documents under the topic clusters can be further achieved by marking the topic clusters, so that the documents with positive and negative emotion labels can be obtained.
The implementation manner of the positive and negative emotion marking on the theme clusters is not limited in the embodiment.
(1) Automatic labeling of machines
According to the realization mode, the public opinion emotion recognition device can automatically complete positive and negative emotion labeling of the topic clusters without participation of users in labeling.
Optionally, the performing positive and negative emotion marking on the corresponding topic cluster includes: and screening out topic clusters with emotion tendencies from all topic clusters based on a preset emotion dictionary, and marking positive and negative emotion.
The alternative embodiment needs to provide an emotion dictionary formed by a large number of words marked with positive and negative emotion labels in advance, and then compares the words in each topic cluster with the words in the emotion dictionary one by one, so that topic clusters with emotion tendencies are screened out and positive and negative emotion labels are carried out. If more words exist in a certain topic cluster, only the words with the probability of N (such as 5) before the ranking of the words can be compared with the words in the emotion dictionary one by one, so that the emotion marking efficiency is improved.
(2) User participation annotation
In the implementation mode, the user is required to participate in labeling, and the public opinion emotion recognition device performs positive and negative emotion labeling based on data submitted by the user.
Optionally, the performing positive and negative emotion marking on the corresponding topic cluster includes: acquiring a theme cluster designated by a user and positive and negative emotions corresponding to the theme cluster; and labeling positive and negative emotions of the theme clusters appointed by the user.
In the alternative embodiment, the user manually designates the topic clusters with emotion tendencies, designates the positive and negative emotions of the topic clusters, and submits the topic clusters to the public opinion emotion recognition device; and the public opinion emotion recognition device acquires the topic cluster to be marked and the positive and negative emotions, and further carries out positive and negative emotion marking on the topic cluster appointed by the user.
In the two emotion marking modes, the machine automatic marking is simpler than the user participating marking operation and has higher marking efficiency, but the marking effect is poorer, so that the accuracy of emotion recognition can be influenced. Therefore, in the embodiment, the positive and negative emotion marking is preferably performed on the theme clusters by adopting a semi-automatic marking mode of manual participation. Because the topic clustering is performed on the public opinion documents, the workload of manual annotation can be greatly reduced.
For example, 10000 documents are corresponding to a certain public opinion document, and if a traditional full-manual labeling mode is adopted, the 10000 documents need to be labeled one by one; if a semi-automatic labeling mode is adopted, the 10000 documents are clustered to obtain 10 theme clusters, and then the 10 theme clusters are labeled, so that the labeling workload can be reduced from 100 times to 10 times, the manual labeling workload is greatly reduced, and the accuracy of the subsequent positive and negative emotion recognition is ensured.
S30, taking a document with positive and negative emotion labels as a training sample, and training an emotion recognition model;
in this embodiment, in order to perform emotion recognition, a corresponding emotion recognition model needs to be trained in advance. Specifically, a document with positive and negative emotion labels is used as a training sample, and a preset machine learning algorithm is adopted to train an emotion recognition model, so that a corresponding emotion recognition model is obtained.
The machine learning algorithm adopted for training the emotion recognition model is not limited in this embodiment, and is, for example, a decision tree, a neural network, a logistic regression, and other algorithms.
And step S40, carrying out emotion recognition on the target public opinion document to be recognized based on the emotion recognition model.
In this embodiment, after an emotion recognition model is obtained through training, only the target public opinion document to be recognized needs to be input into the emotion recognition model, and the emotion corresponding to the target public opinion document, for example, positive emotion or negative emotion, can be output.
In the embodiment, a topic model algorithm is adopted to cluster a plurality of public opinion documents in the same field, and then all documents in the public opinion documents are clustered under a plurality of topic clusters respectively, each topic cluster comprises one or a plurality of documents, so that positive and negative emotion marking is carried out on the topic clusters, namely the positive and negative emotion marking is carried out on all the documents under the topic clusters.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the public opinion emotion recognition method according to the present invention. Based on the first embodiment, in this embodiment, after the step S20, the method further includes:
step S50, judging whether the number of topic clusters for positive and negative emotion marking of the topic clusters of the current round and the number proportion of topic clusters between positive and negative emotion meet the condition of forming a training sample;
in general, the emotion recognition rate of the emotion recognition model is mainly related to training samples, and the more the training samples are rich and the more accurate the emotion marking is, the higher the emotion recognition rate of the trained emotion recognition model is. Therefore, in this embodiment, more optimal training samples are preferably obtained through multiple rounds of topic clustering.
In this embodiment, the number of topic clusters for labeling positive and negative emotions and the number proportion of topic clusters between positive and negative emotions are preferably used as screening conditions for obtaining a better training sample. On the premise that the public opinion documents subjected to topic clustering are unchanged, if the number of topic clusters obtained by clustering is larger, training samples are more abundant, emotion classification is finer, and emotion labeling is more accurate.
Note that emotion classification is not finer and better. For example, 10000 documents are included in a public opinion document, which can be divided into 5, 10 and 20 topic clusters, if labeling is performed according to 5 topic clusters, and if 20 documents are included in each topic cluster on average, the labeled emotion classification mode is too extensive to be further distinguished; if labeling is carried out according to 20 topic clusters, 5 documents are averagely arranged in each topic cluster, the labeled emotion dividing mode is too thin, and labeling is difficult; if the labeling is carried out according to 10 topic clusters, and 10 documents are arranged in each topic cluster on average, the labeled emotion classification mode is relatively proper.
The specific setting of the conditions for forming the training samples in this embodiment is not limited, and is specifically set according to actual experience. For example, the number of documents in each topic cluster is guaranteed to be within a specified number range, such as 10-20 documents; the relative equilibrium of the quantity proportion between the positive emotion topic cluster and the negative emotion topic cluster is ensured, for example, the difference is within 10%.
Step S60, if yes, stopping performing the next round of theme clustering; if not, increasing the number of topic clusters output by topic clustering and continuing to adopt a topic model algorithm to perform the next topic clustering on the public opinion documents.
In the embodiment, the first-round topic clustering is preferably performed by adopting a smaller clustering number, and if the number of topic clusters marked by positive and negative emotions and the number proportion of topic clusters between positive and negative emotions are not satisfied to form a training sample, the next-round topic clustering is performed and the clustering number is increased so as to obtain a better training sample.
In the embodiment, the clustering number is modified to perform multi-round theme clustering, so that a training sample which is richer and more accurate in emotion marking is obtained, and the emotion recognition accuracy of the emotion recognition model is improved.
Referring to fig. 4, fig. 4 is a schematic diagram of a refinement flow of step S40 in fig. 2. Based on the first embodiment, in this embodiment, the step S40 further includes:
step S401, extracting key sentences in a target public opinion document to be identified as text abstracts;
and step S402, carrying out emotion recognition on the text abstract based on the emotion recognition model.
In general, in the conventional public opinion emotion recognition, titles of public opinion documents are input into an emotion recognition model to perform emotion recognition, but if only titles are used for replacing public opinion document books, sometimes information is insufficient, and if emotion recognition is performed directly by using public opinion document texts, various description modes exist in the whole public opinion document, so that the emotion recognition model is difficult to grasp a focus, and emotion recognition accuracy is further affected.
Therefore, in the implementation, key sentences in the target public opinion document are preferably used as text abstracts, and emotion recognition is performed by replacing the public opinion document with the text abstracts. The key sentences refer to sentences which can represent emotion in the public opinion documents.
In the embodiment, based on the traditional public opinion emotion recognition, the title or the text is taken as input to perform emotion recognition, so that the problem of low recognition accuracy exists, key sentences in the public opinion documents are selected to serve as text abstracts, and then the text abstracts are used for replacing the public opinion documents to perform emotion recognition, so that the information quantity required by emotion recognition is ensured, and meanwhile, the focus of the public opinion documents is grasped, so that the emotion recognition accuracy can be improved.
Referring to fig. 5, fig. 5 is a schematic diagram of a refinement flow of step S401 in fig. 4. In this embodiment, the key sentences in the target public opinion document are extracted specifically through the following processing flows:
step S4011, sentence segmentation is carried out on the target public opinion document to obtain all sentences forming the target public opinion document;
in this embodiment, before extracting the key sentences in the target public opinion document, the target public opinion document needs to be firstly divided into sentences, so as to obtain all sentences forming the target public opinion document.
In this embodiment, it is preferable to use a period, a semicolon, an exclamation mark, etc. as separators of sentences in the document, and the public opinion document separation operation is realized by identifying the separators.
Step S4012, calculating the similarity between the title of the target public opinion document and each sentence;
the preferred target public opinion document of this embodiment is provided with a title. The title is generally an overview of the whole document and thus is the most representative of the content of the whole document, and therefore sentences with high similarity to the title can be regarded as key sentences in the public opinion document.
The calculation method of the similarity between the title and the sentence in the public opinion document is not limited in this embodiment. For example, a BM25 algorithm (Best Match 25) or a cosine similarity algorithm is adopted to obtain the similarity between the title and the sentence in the public opinion document.
Step S4013, sorting each sentence based on the similarity;
step S4014, selecting a specified number of sentences from all or specified number of sentences after sorting through a maximum boundary correlation algorithm;
and step S4015, taking the selected sentences as key sentences in the target public opinion document to form a text abstract.
In this embodiment, the similarity between the title of the target public opinion document and each sentence is calculated, so as to obtain a plurality of key sentences in the target public opinion document, but considering that similarity may exist between the key sentences, the key sentence information is overlapped, so that the total information amount as the text abstract is affected.
Therefore, in this embodiment, sentences with overlapping information in all the sorted sentences are further removed by the maximum boundary correlation algorithm (Maximal Marginal Relevance, MMR), and a specified number of sentences are selected as key sentences, wherein the selected key sentences are related to the title, and overlapping information among the key sentences is the least, so that more information can be retained in the text abstract formed after merging.
Referring to fig. 6, fig. 6 is a schematic diagram of a refinement flow of step S402 in fig. 4. In this embodiment, emotion recognition is specifically performed on the text abstract through the following processing flow:
step S4021, word segmentation is carried out on the text abstract to obtain a plurality of words;
in this embodiment, based on the recognition format requirement of the emotion recognition model, the text abstract needs to be converted into the word in advance through word segmentation operation. The specific processing manner of the word segmentation in this embodiment is not limited.
Step S4022, constructing word vectors corresponding to the words in the text abstract based on the words obtained by word segmentation;
step S4023, merging the word vectors into sentence vectors, and inputting the sentence vectors into the emotion recognition model to perform emotion recognition on the text abstract.
In this embodiment, after the text abstract is considered to perform word segmentation, the relevance between the words is greatly reduced, so as to continuously reserve the relevance between the words, thereby improving the emotion recognition accuracy. The word vector is specifically used for converting words in natural language into dense vectors which can be understood by a computer.
In this embodiment, before the word vectors are input into the emotion recognition model, each word vector needs to be combined into a sentence vector, the sentence vector corresponds to a sentence in the text abstract before word segmentation, and then each sentence vector is input into the emotion recognition model for emotion recognition, so that emotion recognition on the text abstract is realized.
In the embodiment, the word vector corresponding to each word is constructed, so that the associated information among the words before word segmentation can be continuously reserved, and the emotion recognition accuracy is improved.
The invention also provides a computer readable storage medium.
The computer readable storage medium of the present invention stores a public opinion emotion recognition program which, when executed by a processor, implements the steps of the public opinion emotion recognition method described in any of the above embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM), comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method according to the embodiments of the present invention.
While the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present invention and the scope of the appended claims, which are to be accorded the full scope of the present invention as defined by the following description and drawings, or by any equivalent structures or equivalent flow changes, or by direct or indirect application to other relevant technical fields.

Claims (6)

1. The public opinion emotion recognition method is characterized by comprising the following steps of:
performing topic clustering on a plurality of public opinion documents in the same field by adopting a topic model algorithm to obtain a plurality of topic clusters, wherein each topic cluster comprises one or a plurality of documents;
the method comprises the steps of carrying out positive and negative emotion marking on corresponding topic clusters to obtain a document with positive and negative emotion labels;
taking the document with the positive and negative emotion labels as a training sample, and training an emotion recognition model;
based on the emotion recognition model, performing emotion recognition on the target public opinion document to be recognized;
the emotion recognition of the target public opinion document to be recognized based on the emotion recognition model comprises the following steps:
sentence segmentation is carried out on the target public opinion document to obtain all sentences forming the target public opinion document;
calculating the similarity between the title of the target public opinion document and each sentence;
sorting each sentence based on the similarity;
selecting a specified number of sentences from all or the specified number of sentences after sorting through a maximum boundary correlation algorithm;
taking the selected sentences as key sentences in the target public opinion documents to form text abstracts;
carrying out emotion recognition on the text abstract based on the emotion recognition model;
the positive and negative emotion marking of the corresponding theme cluster comprises the following steps:
acquiring a theme cluster designated by a user and positive and negative emotions corresponding to the theme cluster;
and labeling positive and negative emotions of the theme clusters appointed by the user.
2. The public opinion emotion recognition method of claim 1, wherein the labeling positive and negative emotions of the corresponding topic clusters comprises:
and screening out topic clusters with emotion tendencies from all topic clusters based on a preset emotion dictionary, and marking positive and negative emotion.
3. The public opinion emotion recognition method of claim 1 or 2, further comprising, after the step of labeling positive and negative emotions on the corresponding topic clusters to obtain a document with positive and negative emotion labels:
judging whether the number of the topic clusters for positive and negative emotion marking of the topic clusters of the current round meets the condition of forming a training sample or not;
if yes, stopping performing the next round of theme clustering;
if not, increasing the number of topic clusters output by topic clustering and continuing to adopt a topic model algorithm to perform the next topic clustering on the public opinion documents.
4. The public opinion emotion recognition method of claim 1, wherein emotion recognition of the text abstract based on the emotion recognition model comprises:
word segmentation is carried out on the text abstract to obtain a plurality of words;
constructing word vectors corresponding to the words in the text abstract based on the words obtained by word segmentation;
and merging the word vectors into sentence vectors, and inputting the sentence vectors into the emotion recognition model to perform emotion recognition on the text abstract.
5. A public opinion emotion recognition device comprising a memory, a processor and a public opinion emotion recognition program stored on the memory and executable on the processor, the public opinion emotion recognition program when executed by the processor implementing the steps of the public opinion emotion recognition method of any of claims 1-4.
6. A computer-readable storage medium, wherein a public opinion emotion recognition program is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the public opinion emotion recognition method of any of claims 1-4.
CN201811096799.3A 2018-09-18 2018-09-18 Public opinion emotion recognition method and device and computer readable storage medium Active CN109359296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811096799.3A CN109359296B (en) 2018-09-18 2018-09-18 Public opinion emotion recognition method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811096799.3A CN109359296B (en) 2018-09-18 2018-09-18 Public opinion emotion recognition method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109359296A CN109359296A (en) 2019-02-19
CN109359296B true CN109359296B (en) 2023-08-18

Family

ID=65351399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811096799.3A Active CN109359296B (en) 2018-09-18 2018-09-18 Public opinion emotion recognition method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109359296B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888981B (en) * 2019-10-30 2022-11-01 深圳价值在线信息科技股份有限公司 Title-based document clustering method and device, terminal equipment and medium
CN111475640A (en) * 2020-04-03 2020-07-31 支付宝(杭州)信息技术有限公司 Text emotion recognition method and device based on emotion abstract
CN113515594A (en) * 2021-04-28 2021-10-19 京东数字科技控股股份有限公司 Intention recognition method, intention recognition model training method, device and equipment
CN113762343B (en) * 2021-08-04 2024-03-15 德邦证券股份有限公司 Method, device and storage medium for processing public opinion information and training classification model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544242A (en) * 2013-09-29 2014-01-29 广东工业大学 Microblog-oriented emotion entity searching system
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN107239439A (en) * 2017-04-19 2017-10-10 同济大学 Public sentiment sentiment classification method based on word2vec
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856109B2 (en) * 2012-06-21 2014-10-07 Microsoft Corporation Topical affinity badges in information retrieval
US20140095148A1 (en) * 2012-10-03 2014-04-03 Kanjoya, Inc. Emotion identification system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544242A (en) * 2013-09-29 2014-01-29 广东工业大学 Microblog-oriented emotion entity searching system
CN106202032A (en) * 2016-06-24 2016-12-07 广州数说故事信息科技有限公司 A kind of sentiment analysis method towards microblogging short text and system thereof
CN107239439A (en) * 2017-04-19 2017-10-10 同济大学 Public sentiment sentiment classification method based on word2vec
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Also Published As

Publication number Publication date
CN109359296A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN109359296B (en) Public opinion emotion recognition method and device and computer readable storage medium
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN106650943B (en) Auxiliary writing method and device based on artificial intelligence
US10719664B2 (en) Cross-media search method
CN111753060A (en) Information retrieval method, device, equipment and computer readable storage medium
CN107622056B (en) Training sample generation method and device
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN104503998B (en) For the kind identification method and device of user query sentence
JP2022130635A (en) Conference support system, conference support device, method for supporting conference, and program
CN111105209B (en) Job resume matching method and device suitable for person post matching recommendation system
CN111651996B (en) Digest generation method, digest generation device, electronic equipment and storage medium
CN112052356B (en) Multimedia classification method, apparatus and computer readable storage medium
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN105653547B (en) Method and device for extracting text keywords
CN107247751B (en) LDA topic model-based content recommendation method
CN111767385A (en) Intelligent question and answer method and device
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN109615009B (en) Learning content recommendation method and electronic equipment
CN109492105A (en) A kind of text sentiment classification method based on multiple features integrated study
CN112131876A (en) Method and system for determining standard problem based on similarity
CN109800305A (en) Based on the microblogging mood classification method marked naturally
EP2369504A1 (en) System
CN110110143B (en) Video classification method and device
CN110738047A (en) Microblog user interest mining method and system based on image-text data and time effect
JP2005292958A (en) Teacher data preparation device and program, language analysis processor and program and summary processor and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant