CN116911286A - Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium - Google Patents

Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium Download PDF

Info

Publication number
CN116911286A
CN116911286A CN202310907870.6A CN202310907870A CN116911286A CN 116911286 A CN116911286 A CN 116911286A CN 202310907870 A CN202310907870 A CN 202310907870A CN 116911286 A CN116911286 A CN 116911286A
Authority
CN
China
Prior art keywords
barrage
emotion
target
sample
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310907870.6A
Other languages
Chinese (zh)
Inventor
田园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202310907870.6A priority Critical patent/CN116911286A/en
Publication of CN116911286A publication Critical patent/CN116911286A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a dictionary construction method, an emotion analysis device, equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a bullet screen training sample, an expression text and a pigment text, identifying the expression text semanteme, determining semantic expression, determining emotion tendency of the expression text according to the semantic expression, determining pigment emotion type by a user according to pigment emotion classification, constructing a bullet screen emotion symbol dictionary according to the emotion tendency of the expression text, the pigment text and the emotion type of the pigment text, and constructing a bullet screen emotion analysis dictionary according to the bullet screen emotion symbol dictionary. Acquiring a bullet screen to be analyzed; according to a preset emotion analysis dictionary, emotion category analysis is carried out on the barrage to be analyzed, and a target barrage containing the pigment and the character in the barrage to be analyzed and emotion categories represented by the target barrage are determined; and obtaining emotion analysis results according to the target barrage and emotion categories represented by the target barrage. The method increases the accuracy of barrage emotion analysis.

Description

Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium
Technical Field
The present application relates to the field of computer natural language processing technologies, and in particular, to a dictionary construction method, an emotion analysis device, equipment, and a storage medium.
Background
At present, short videos are popular in China, so that people can bring rich mental lives, and meanwhile, the existing contents such as low custom content, bad public opinion, metaphor violence and the like can harm the physical and mental health of the users. Although a large number of auditors audit the content of the media users, it is difficult to ensure that the information contained in the short video can be accurately identified, so that a large number of network leakage videos can be generated, and a large number of negative colors are likely to be contained, so that the development of children and teenagers is adversely affected. Therefore, emotion recognition analysis of barrage information published in short videos is very important for bad and negative short video filtering and public opinion monitoring. Barrage emotion analysis also belongs to text emotion analysis, namely text opinion mining, and is a process of mining text discovery with emotion tendencies and performing emotion classification.
In the prior art, text emotion analysis mainly includes two kinds of machine learning processing methods and regular emotion dictionary processing methods. In the machine learning processing method, emotion analysis is mainly classified into supervised and semi-supervised classification methods by category. Common supervised machine learning methods, such as support vector machines SVM (Support Vector Machines ), maximum entropy models, naive bayes methods, and classification models based on modified LSTM (Long Short Term Memory, long term memory); the emotion analysis method based on semi-supervision utilizes tagged data and untagged data at the same time, and predicts and classifies the untagged data under the condition that the tagged data is limited; the emotion dictionary analysis method based on the rules is based on a public emotion dictionary library, expands an emotion dictionary in the target field, sets semantic rules, and calculates emotion values by using emotion polarity, syntax pattern, grammar analysis and other methods so as to achieve the emotion classification effect.
However, the barrage emotion analysis belongs to text emotion analysis, but has a problem of low accuracy because the barrage emotion analysis is performed directly by using a traditional text emotion analysis method due to some characteristics possessed by the barrage.
Disclosure of Invention
The application provides a dictionary construction method, an emotion analysis device, equipment and a storage medium, which are used for solving the problem of low accuracy of barrage emotion analysis.
In a first aspect, the present application provides a dictionary construction method, including:
acquiring a bullet screen to-be-trained sample, and acquiring expression text and pigment text in each bullet screen in the bullet screen to-be-trained sample;
carrying out semantic recognition on the expression text, and determining semantic expression of the expression text;
determining emotion tendency information of the expression text according to semantic expression of the expression text;
determining emotion classification of the pigment text in response to emotion classification of the pigment text by the user;
constructing a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion type of the pigment text;
and constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary.
In the embodiment of the application, the acquisition of the bullet screen to-be-trained sample and the expression text and the pigment text in each bullet screen in the bullet screen to-be-trained sample comprises the following steps:
Acquiring a bullet screen to-be-trained sample, wherein the bullet screen to-be-trained sample comprises a plurality of bullet screens;
traversing the barrage in the barrage to-be-trained sample, and determining target characters in the barrage;
determining a target text in the barrage according to the target characters;
and distinguishing the target text according to the character composition of the target text, and determining the expression text and the pigment text in each barrage in the barrage to-be-trained sample.
In the embodiment of the application, the target keywords in the bullet screen to-be-trained sample are extracted, wherein the target keywords are the same keywords appearing in any at least two bullet screens, and the same keywords at least comprise two characters;
carrying out semantic annotation on the target keywords, and determining semantic expression of the target keywords;
determining emotion tendency information of the target keywords according to semantic expressions of the target keywords;
constructing a barrage keyword dictionary according to the target keywords and emotion tendency information of the target keywords;
and constructing a barrage emotion analysis dictionary according to the barrage keyword dictionary and the barrage emotion symbol dictionary.
In the embodiment of the application, a target barrage to-be-trained sample is obtained in response to the condition tendency marking operation of a user on the barrage to-be-trained sample, wherein the target barrage to-be-trained sample comprises the barrage to-be-trained sample and the condition tendency label of the barrage in the barrage to-be-trained sample;
Judging target stop words in a target barrage to-be-trained sample according to a preset stop word list and the target barrage to-be-trained sample, wherein the target stop words are stop words representing a trend of target emotion in the barrage;
constructing a barrage stop word dictionary according to the target stop word;
and constructing a barrage emotion analysis dictionary according to the barrage stop word dictionary and the barrage emotion symbol dictionary.
In the embodiment of the application, the degree adverbs in the bullet screen to-be-trained samples are extracted in response to the extraction operation of the user;
responding to the emotion classification labeling operation of the user on the degree adverbs, and obtaining labeling results of the degree adverbs;
constructing a degree adverb emotion dictionary according to a preset degree adverb emotion dictionary and a labeling result of the degree adverbs;
and constructing a barrage emotion analysis dictionary according to the degree adverb emotion dictionary and the barrage emotion symbol dictionary.
In a second aspect, the present application provides an emotion analysis method, comprising:
acquiring a bullet screen to be analyzed;
according to a preset emotion analysis dictionary, emotion category analysis is carried out on the barrage to be analyzed, a first target barrage containing pigment characters in the barrage to be analyzed and emotion categories represented by the first target barrage are determined, wherein the emotion analysis dictionary is a barrage emotion analysis dictionary according to any one of claims 1-5;
And obtaining emotion analysis results according to the first target barrage and emotion categories represented by the first target barrage.
In the embodiment of the application, after emotion analysis is performed on the barrage to be analyzed according to the preset emotion analysis dictionary and the first target barrage containing the pigment characters in the barrage to be analyzed is determined, the method further comprises the following steps:
according to a preset emotion analysis dictionary, performing emotion tendency analysis on other barrages, and determining second target barrages in the other barrages, wherein the other barrages are barrages of the barrages to be analyzed, which are not the first target barrages, and the second target barrages are barrages of which emotion tendencies in the other barrages are target tendencies;
performing word segmentation on the second target barrage to obtain a word segmentation set;
screening out dead words in the word segmentation set according to a preset barrage dead word dictionary to obtain a target word segmentation set;
inputting the target word segmentation set into a preset data enhancement classification model to obtain an emotion classification result of the target word, wherein the data enhancement classification model comprises a data enhancement model for enhancing the data of the target word segmentation set and a classification model for classifying an output result of the data enhancement model.
In the embodiment of the application, before the target word segmentation set is input into the preset data enhancement classification model to obtain the emotion classification result of the target word, the method further comprises the following steps:
Acquiring a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample and a second unlabeled barrage sample, wherein the second labeled barrage sample is a labeled barrage sample obtained after data enhancement is performed on the first labeled barrage sample, and the second unlabeled barrage sample is an unlabeled barrage sample obtained after data enhancement is performed on the second unlabeled barrage sample;
training the initial model according to the first labeled barrage sample, the second labeled barrage sample, the first unlabeled barrage sample and the second unlabeled barrage sample to obtain a target training model;
extracting a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample with a sample prediction label and a second unlabeled barrage sample with a sample prediction label to obtain target barrage sample data, wherein at least two groups of target barrage sample data are obtained by inputting the first unlabeled barrage sample and the second unlabeled barrage sample into a target training model;
vectorizing and characterizing the two groups of target barrage sample data, and filling the two groups of target barrage sample data by using a recessive linear difference method to obtain transition samples with different labels;
Inputting a virtual sample into an initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data;
and according to the prediction result, adjusting the initial MLP neural network training model to obtain the target MLP neural network training model.
In the embodiment of the application, according to the prediction result, an initial MLP neural network training model is adjusted to obtain a target MLP neural network training model, which comprises the following steps:
determining KL divergence of an initial MLP neural network training model according to the prediction result;
determining whether an initial MLP neural network training model converges or not according to the KL divergence;
if the initial MLP neural network training model converges, determining the initial MLP neural network training model as a target MLP neural network training model;
if the initial MLP neural network training model is not converged, adjusting parameters of the initial MLP neural network training model, and re-executing the steps of inputting a virtual sample into the initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data until the adjusted initial MLP neural network training model is converged to obtain a target MLP neural network training model.
In a third aspect, the present application provides a barrage dictionary building apparatus applied to barrage dictionary building, comprising:
the method comprises the steps of obtaining a template, namely obtaining a bullet screen to-be-trained sample, and expressing text and pigment text in each bullet screen in the bullet screen to-be-trained sample;
the first determining module is used for carrying out semantic recognition on the expression text and determining semantic expression of the expression text;
the second confirmation module is used for determining emotion tendency information of the expression text according to semantic expression of the expression text;
the third determining module is used for determining the emotion type of the pigment text in response to the emotion classification of the pigment text by the user;
the first construction module is used for constructing a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion category of the pigment text;
and the second construction module is used for constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary.
In a fourth aspect, the present application provides a barrage emotion analysis device, applied to barrage emotion analysis, including:
the second acquisition module acquires a bullet screen to be analyzed;
a fourth determining module, configured to perform emotion type analysis on the bullet screen to be analyzed according to a preset emotion analysis dictionary, and determine a first target bullet screen containing pigment characters in the bullet screen to be analyzed and emotion types represented by the first target bullet screen, where the emotion analysis dictionary is a bullet screen emotion analysis dictionary according to any one of claims 1 to 5;
The obtaining module is used for obtaining emotion analysis results according to the first target barrage and emotion categories represented by the first target barrage.
In a fifth aspect, the present application provides an electronic device, comprising: a processor, a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of the present application.
In a sixth aspect, the present application provides a computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of the application.
The application provides a dictionary construction method, an emotion analysis device, equipment and a storage medium. The training method comprises the steps of obtaining a sample to be trained and the pigment characters and the expression symbols required to be trained in a bullet screen; recognizing the semantic text part of the expression symbol to obtain the semantic expression of the expression symbol; and labeling emotion tendencies of the emoticons through semantic expressions, and labeling emotion categories of the pigment characters by the user. And constructing an emotion analysis dictionary by virtue of the pigment characters, the pigment character emotion types, the emotion marks and the emotion mark trends, then obtaining a barrage to be analyzed, obtaining a first target barrage containing pigment characters by virtue of emotion analysis dictionary matching, and carrying out emotion analysis on the first target barrage by virtue of the pigment character dictionary in the emotion analysis dictionary to obtain an emotion classification result of the first target barrage. The analysis result can be obtained by accurately constructing the dictionary and then combining the emotion analysis dictionary with the emotion analysis method provided by the application, thereby improving the technical effect of accuracy of barrage emotion analysis.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic flow chart of a dictionary construction method according to an embodiment of the present application;
FIG. 2 is a flowchart of another dictionary construction method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of an emotion analysis method according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of another emotion analysis method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a dictionary creating apparatus according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an emotion analysis device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
Quick analysis of the emotion classification of a barrage in a large number of short videos and emotion analysis is challenging. On the one hand, new popular words appear in the bullet screen, and the word combination mode is peculiar and often forms fixed emotion characteristics after a large number of words are transmitted. For example, words such as "yawning", "Olive" and the like are considered, and the existing emotion analysis method has low accuracy in identifying and classifying the popular emotion of the bullet screen. On the other hand, special symbols such as pigment characters, expressions and the like can be published in the short video, so that the emotion classification is difficult to analyze by using a traditional emotion dictionary processing method. Meanwhile, the effective data set of the barrage is too few, timeliness is very strong, a large number of barrages are difficult to label in a short time, the accuracy of the existing supervised emotion classification model is low, the existing semi-supervised classification model does not fully utilize limited label data, and the training model is easy to be fitted by separately processing the label-free data and the label-free data. Making it difficult to analyze its emotion classification using conventional machine learning methods. In summary, it is difficult to analyze the barrage emotion classification using conventional methods.
In order to solve the problems, the application provides a dictionary construction method, an emotion analysis device, equipment and a storage medium. The training method comprises the steps of obtaining a sample to be trained and the pigment characters and the expression symbols required to be trained in a bullet screen; recognizing the semantic text part of the expression symbol to obtain the semantic expression of the expression symbol; and labeling emotion tendencies of the emoticons through semantic expressions, and labeling emotion categories of the pigment characters by the user. And constructing an emotion analysis dictionary by virtue of the pigment characters, the pigment character emotion types, the emotion marks and the emotion mark trends, then obtaining a barrage to be analyzed, obtaining a first target barrage containing pigment characters by virtue of emotion analysis dictionary matching, and carrying out emotion analysis on the first target barrage by virtue of the pigment character dictionary in the emotion analysis dictionary to obtain an emotion classification result of the first target barrage. The analysis result can be obtained by accurately constructing the dictionary and then combining the emotion analysis dictionary with the emotion analysis method provided by the application, thereby improving the technical effect of accuracy of barrage emotion analysis.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The embodiment of the application provides a dictionary construction method, an emotion analysis device, equipment and a storage medium, wherein an execution subject can be a server, wherein the server can be a mobile phone, a tablet, a computer and other equipment, and the implementation mode of the execution subject is not particularly limited. As long as the executing body can acquire the bullet screen to-be-trained sample and the expression text and the pigment text in each bullet screen in the bullet screen to-be-trained sample; carrying out semantic recognition on the expression text, and determining semantic expression of the expression text; determining emotion tendency information of the expression text according to semantic expression of the expression text; determining emotion classification of the pigment text in response to emotion classification of the pigment text by the user; constructing a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion type of the pigment text; constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary, and acquiring a barrage to be analyzed; according to a preset emotion analysis dictionary, emotion category analysis is carried out on the barrage to be analyzed, a first target barrage containing pigment characters in the barrage to be analyzed and emotion categories represented by the first target barrage are determined, wherein the emotion analysis dictionary is a barrage emotion analysis dictionary; and obtaining emotion analysis results according to the first target barrage and emotion categories represented by the first target barrage.
Among them, MLP (Multilayer Perceptron ) is a common neural network model. The device consists of an input layer, a plurality of hidden layers and an output layer, wherein each layer comprises a plurality of neurons which are connected through weights. Each neuron calculates a weighted sum of its input and weight, and then performs nonlinear transformation by an activation function, and outputs it to the next layer of neurons. The whole model learning process is realized through a back propagation algorithm. The MLP is widely applied to machine learning tasks such as classification, regression, clustering and the like, has better nonlinear modeling capability and generalization capability, and is also one of basic components of deep learning.
The KL divergence concept is derived from both probability theory and information theory. KL divergence is also known as: relative entropy, mutual entropy, authentication information, kullback entropy, kullback-Leible divergence (i.e., shorthand for KL divergence). In the fields of machine learning and deep learning, KL divergence is widely applied to variable self-encoder (Variational AutoEncoder, VAE for short), EM algorithm and GAN network. Statistically, the KL divergence can be used to measure the degree of difference between two distributions. If the difference is smaller, the KL divergence is smaller, and vice versa. When the two distributions are identical, the KL divergence is 0.
Fig. 1 is a flow chart of a dictionary construction method according to an embodiment of the present application. As shown in fig. 1, the method execution body may be a server or other servers, and the embodiment is not particularly limited herein, and as shown in fig. 1, the method includes:
s101, acquiring a bullet screen to-be-trained sample, and acquiring expression texts and pigment text in each bullet screen in the bullet screen to-be-trained sample.
The bullet screen to be trained is set by the bullet screen set by the user, and the bullet screen set by the user can be set by referencing the bullet screen in the video with the heat reaching a certain level. The video heat level may be a daily million plays, a monthly million plays, a daily heat ranking video website a hundred and a monthly heat ranking video website a hundred and so on.
The emoticon specific format is "[ text expression ]".
The pigment and the text are formed by combining irregular symbols.
In this embodiment, the barrage set by the user is used as a barrage to-be-trained sample, each barrage of the barrage to-be-trained sample is identified, and when the specific format of the expression symbol is identified, the barrage is partially screened out and used as the expression text. And removing the text part and the expression symbol of each barrage in the barrage to-be-trained sample to obtain the pigment text. Meanwhile, some expression texts and pigment text can be obtained from the dog searching expression library to be used as supplement.
S102, carrying out semantic recognition on the expression text, and determining semantic expression of the expression text.
In the embodiment of the application, the looking-up emoticon program traverses the whole content of the bullet screen, extracts the text between the format symbol "[" and "]" after determining the format symbol "[" and "]", and composes the text into the emoticon.
The semantic recognition may refer to a natural language processing technology, that is, a process of performing meaning recognition on an expression text to determine semantic expression, and in the embodiment of the present application, the semantic expression of the expression text may be determined by comparing with a preset expression-word sense table, or by performing recognition on a barrage context to determine the semantic expression of the expression text.
S103, determining emotion tendency information of the expression text according to semantic expression of the expression text.
The emotion tendency information may be classified into information with emotion and information without emotion, wherein the information with emotion may include mood expression, for example: nausea and heart injury. The non-emotion information can comprise descriptions of objective objects, such as houses and vehicles, wherein emotion trend information can be determined through manual identification, and emotion trend information can also be judged through calculating the correlation between the expression text and seven emotion basic words.
In the embodiment, an SO-PMI (Semantic Orientation Pointwise Mutual Information) is used for judging emotion classification of an expression text by using a seven-classification emotion judgment algorithm improved by an emotion tendency point mutual information algorithm, on the basis of two-classification judgment of a traditional SO-PMI (Semantic Orientation Pointwise Mutual Information) algorithm, the seven-classification emotion classification is realized by calculating the association degree of an expression text and seven-classification basic emotion words, the traditional SO-PMI algorithm two-classification judgment method only judges whether a data text to be analyzed is an emotion or a non-emotion, a group of non-emotion reference words and an emotion reference word are preset, a word to be analyzed and a non-emotion word PMI (Pointwise Mutual Information) value are calculated respectively, the relation of the word to be analyzed and the emotion word PMI value are calculated, the higher the calculated value is, the higher the similarity is and the stronger the relation is, when the mutual information value of the two words is smaller than zero, the mutually exclusive semantics of the two words are explained, when the two mutual information value of the two words is equal to zero, the seven-emotion words are arranged, the seven-words can be recognized as the emotion text to be analyzed if the seven-words belong to the emotion text to be analyzed, and the emotion judgment text to be judged if the emotion value is set, and the emotion value is not emotion value is calculated, and the emotion value of the emotion text to be analyzed is judged if the emotion value is included in the emotion text, the expression will be judged as emotional. The human recognition is such that the expression text expression does not contain emotion, for example: men, women, children, etc., will judge the expression as anergy.
S104, determining emotion types of the pigment text in response to emotion classification of the pigment text by the user.
Wherein, emotion classification is to determine emotion classification of the pigment text through the use habit of the user and the appearance of the pigment text, for example:is often used by people to express happiness, then this category of pigment is happiness,/-herba Polygoni Avicularis>The look and feel of the person is liked, and then the character class is liked.
The emotion categories may be like, sad, nausea, surprise, happiness, anger, fear.
S105, constructing a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion type of the pigment text.
The expression dictionary comprises pigment words and expression texts.
And putting the pigment text and the emotion category of the corresponding pigment text together, putting the emotion tendencies of the emotion symbol, the emotion text and the corresponding emotion text together, and constructing an emotion symbol dictionary. When emotion analysis is carried out by using the emotion dictionary, the emotion classification of the pigment words in the emotion dictionary can be inquired through the pigment words when the matching of the pigment words in the bullet screen to be analyzed and the pigment words in the emotion dictionary is successful. And similarly, if the emotion marks in the bullet screen to be analyzed and the emotion marks in the emotion dictionary are successfully matched, the emotion tendency of the emotion marks in the emotion dictionary can be inquired through the emotion marks.
S106, constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary.
The barrage emotion analysis dictionary may be a barrage emotion symbol dictionary, and the construction of the barrage emotion analysis dictionary may include a manner of incorporating the barrage emotion symbol dictionary into an existing barrage emotion analysis dictionary, or may include a manner of using the barrage emotion symbol dictionary as the barrage emotion analysis dictionary.
In the embodiment of the application, the method for constructing the emotion dictionary further comprises the following steps:
extracting target keywords in a bullet screen to-be-trained sample, wherein the target keywords are the same keywords appearing in any at least two bullet screens, and the same keywords at least comprise two characters;
carrying out semantic annotation on the target keywords, and determining semantic expression of the target keywords;
determining emotion tendency information of the target keywords according to semantic expressions of the target keywords;
constructing a barrage keyword dictionary according to the target keywords and emotion tendency information of the target keywords;
and constructing a barrage emotion analysis dictionary according to the barrage keyword dictionary and the barrage emotion symbol dictionary.
Wherein, the keywords can be popular words and hotwords. Since popular words and hotwords are rarely single words, i.e. one character, a minimum of two characters are reserved. The keywords are words with the front occurrence frequency in the bullet screen training samples, for example: the word with the frequency of 1000 before the bullet screen to be trained can be used, and the word with the frequency of 100 before the bullet screen to be trained can be used. The target keywords can be obtained here by the top q public subset method. The first q public subsets consist of 1 longest public sub-sequence and q-1 longest public sub-strings, the longest public sub-sequence being defined to give two strings M and N, assuming that P is a combination of characters whose public has the same sequence order, if the longest P is found, P is the longest public sub-sequence of two known strings. When N backdrops are presented, N (N-1)/2 longest common subsequences can be obtained, where two backdrop data of an optional backdrop to be trained sample produce a common subset. The common subset length is at least greater than 2. And calculating by using a plurality of machines to obtain character combinations with the same sequence in common of all the barrages, then counting the character combinations with the same sequence in common of all the barrages to form a popular language set, finally displaying the popular language set in the form of word cloud pictures, and rapidly finding out the current hot words and popular languages. For example: the bullet screen to be trained sample consists of five bullet screens: "front high energy early warning, when the mobile phone is in the time of showing the card-resistant capability", "front is true high energy-! "," front high energy-! Note that this is not an early warning-! "," front high energy early warning. Non-fighter quick evacuation "," front high energy early warning, fighter last struggle-! The following is carried out The following is carried out ". The 5 barrages are combined pairwise to obtain 10 barrage combinations, and each group obtains the longest public substring as follows: "front high energy", "front high energy early warning", "front high energy-! "," front high energy-! "front high energy early warning", "front high energy early warning-! "front high-energy early warning fighter". Q is set to 3, i.e. a minimum of 3 characters are reserved, by the first q public substring algorithm, with the result that: "[ front high energy ]", "[ front high energy pre-warning ]", "[ high energy ]! Front ], "[ front, high energy ]," [ front high energy ], pre-warning ], "[ front high energy, pre-warning ], fighter ]. By counting the occurrence times of each word, ranking the times, and screening some invalid words: forward, high energy. The 5 barrage keywords are "front high energy", "early warning".
In the embodiment of the application, the method for constructing the emotion dictionary further comprises the following steps:
responding to the user to perform emotion tendency labeling operation on the bullet screen to-be-trained sample to obtain a target bullet screen to-be-trained sample, wherein the target bullet screen to-be-trained sample comprises the bullet screen to-be-trained sample and emotion tendency labels of bullet screens in the bullet screen to-be-trained sample;
judging target stop words in a target barrage to-be-trained sample according to a preset stop word list and the target barrage to-be-trained sample, wherein the target stop words are stop words representing a trend of target emotion in the barrage;
constructing a barrage stop word dictionary according to the target stop word;
and constructing a barrage emotion analysis dictionary according to the barrage stop word dictionary and the barrage emotion symbol dictionary.
In information retrieval, certain words or words are automatically filtered before or after processing natural language data (or text), and are called as stop words, so as to save storage space and improve searching efficiency. For example: "foraging", "therebetween", "spraying perfume".
The emotion tendency marking operation is represented by marking emotion tendency of each barrage in the barrage to be trained, marking the barrage with emotion as emotion, and marking the barrage without emotion as emotion.
The preset stop vocabulary can be any existing stop vocabulary, for example, open-source literature journals and news report type stop words are taken as the basic stop vocabulary.
In the embodiment of the application, a user marks the emotion tendency of each barrage in a barrage to-be-trained sample manually, then manually counts or programs each stop word in a basic stop word list, wherein the frequency of the barrage with emotion and the frequency of the barrage without emotion in the barrage to-be-trained sample appear, and if the frequency of the emotion in the barrage to-be-trained sample is more than twice the frequency of the emotion in the barrage to-be-trained sample, the stop word is removed from the basic stop word list.
In the embodiment of the application, the method for constructing the barrage dictionary further comprises the following steps:
responding to the extraction operation of a user, and extracting the degree adverbs in the bullet screen to-be-trained sample;
responding to the emotion classification labeling operation of the user on the degree adverbs, and obtaining labeling results of the degree adverbs;
constructing a degree adverb emotion dictionary according to a preset degree adverb emotion dictionary and a labeling result of the degree adverbs;
and constructing a barrage emotion analysis dictionary according to the degree adverb emotion dictionary and the barrage emotion symbol dictionary.
The degree adverbs are used for emphasizing or weakening subjective emotion of an expressive, so that the degree adverbs can influence emotion intensity of text semantics or change emotion tendencies of sentences, and the text with the degree adverbs has a certain emotion. For example: very, too much, most, very much.
The degree adverb dictionary consists of a plurality of degree adverbs and corresponding emotion tendencies.
The emotion classification labeling operation for the degree adverbs indicates that the adverbs containing emotion are labeled with emotion, and the adverbs without emotion are labeled with no emotion, for example. For example, there is emotion to perfectly, ideally, well marked, and there is no emotion to very marked.
The dictionary construction method provided by the embodiment of the application can be used for carrying out emotion classification aiming at the bullet screen characteristic pigment characters and expression symbols, so that the bullet screen emotion classification accuracy can be improved.
Fig. 2 is a flowchart of another dictionary construction method according to an embodiment of the present application. As shown in fig. 2, the method execution body may be a server or other servers, and the embodiment is not particularly limited herein, and as shown in fig. 2, the method includes:
s201, using a bullet screen set by a user as a training set;
s202, acquiring an emotion symbol and a pigment character of each barrage of a training set, confirming emotion tendencies of the emotion symbol, determining emotion classification of the pigment character, and constructing a barrage emotion dictionary through the emotion symbol, the corresponding emotion symbol tendencies, the pigment character and the emotion classification of the pigment character, wherein the barrage emotion dictionary is obtained through identifying each barrage in the training set as a specific format through the emotion symbol, the pigment character is obtained by removing text parts and emotion symbol parts of each barrage in the training set, emotion tendencies of the emotion symbol are marked through identifying semantic text of the emotion symbol, and artificial emotion classification is carried out on the pigment character;
S203, using the front q public subset method and the word cloud image to find popular words with the occurrence frequency of the front hundred in the training set. And manually labeling emotion tendencies of popular words. Building a barrage popular emotion dictionary through popular words and corresponding popular emotion tendencies;
s204, using open-source literature journals and news report type stop words as basic stop word lists, manually marking emotion tendencies of each bullet screen in a training set by a user, counting the frequency of occurrence training of each stop word in the basic stop word list, concentrating on the frequency of the bullet screen with emotion and the frequency of the bullet screen without emotion, and if the frequency of occurrence of emotion in the training set is more than twice the frequency of occurrence of emotion in the training set, removing the stop word from the basic stop word list, judging each stop word in the basic stop word list respectively, and finally obtaining a bullet screen stop word dictionary;
s205, based on a preset open source degree adverb dictionary, manually adding bullet screen degree adverbs, manually marking emotion tendencies of the manually added bullet screen degree adverbs, and constructing a bullet screen degree adverb dictionary through the bullet screen degree adverbs and the bullet screen degree adverb emotion tendencies;
s206, the basic emotion dictionary consists of BosonNLP and a preset Chinese emotion dictionary;
The BosonNLP can analyze and mine a large number of labeling data from forums and news, sort and collect words with emotion polarity, and finally construct 114472 emotion polarity dictionaries. The BosonNLP dictionary contains a large number of words such as network spoken popular words, and can better cover unconventional informal texts.
S207, forming a barrage emotion dictionary according to the expression emotion dictionary, the popular emotion dictionary, the stop word dictionary, the degree adverb dictionary and the basic emotion dictionary.
According to the method for constructing the emotion dictionary, provided by the embodiment of the application, the barrage corpus containing the pigment and character expressions can be filtered and classified by constructing the barrage field emotion dictionary, and neutral barrage corpus without emotion tendency is filtered, so that the analysis of non-emotion data is reduced, and the analysis accuracy is improved.
Fig. 3 is a schematic flow chart of an emotion analysis method according to an embodiment of the present application. As shown in fig. 3, the method execution body may be a server or other servers, and the embodiment is not particularly limited herein, and as shown in fig. 3, the method includes:
s301, acquiring a bullet screen to be analyzed.
The barrage to be analyzed is the barrage needing emotion analysis.
S302, analyzing emotion types of the bullet screen to be analyzed according to a preset emotion analysis dictionary, and determining a first target bullet screen containing pigment characters in the bullet screen to be analyzed and emotion types represented by the first target bullet screen, wherein the emotion analysis dictionary is the bullet screen emotion analysis dictionary in the embodiment of the application.
The emotion analysis dictionary comprises an emotion symbol dictionary, a keyword dictionary, a stop word dictionary and a degree adverb dictionary. In the embodiment of the application, the trained character dictionary of the expression symbol dictionary is used for analysis, so that all the barrages containing the characters in the barrages to be analyzed are obtained. And (3) matching the recognized pigment with the pigment corresponding to the emotion dictionary, so as to obtain the emotion category of the corresponding pigment, and determining a first target barrage containing the pigment in the barrage to be analyzed and the principle process of the emotion category represented by the first target barrage through emotion category analysis by the profile.
S303, according to the first target barrage and the emotion type represented by the first target barrage, obtaining an emotion analysis result.
Wherein, the emotion analysis result can comprise a text document containing the pigment and a pigment emotion classification. For example: "Haoyer "happy".
In the embodiment of the application, emotion analysis is performed on the barrage to be analyzed according to a preset emotion analysis dictionary, and after the first target barrage containing the pigment characters in the barrage to be analyzed is determined, the method further comprises the steps of:
according to a preset emotion analysis dictionary, performing emotion tendency analysis on other barrages, and determining second target barrages in the other barrages, wherein the other barrages are barrages of the barrages to be analyzed, which are not the first target barrages, and the second target barrages are barrages of which emotion tendencies in the other barrages are target tendencies;
performing word segmentation on the second target barrage to obtain a word segmentation set;
screening out dead words in the word segmentation set according to a preset barrage dead word dictionary to obtain a target word segmentation set;
inputting the target word segmentation set into a preset data enhancement classification model to obtain an emotion classification result of the target word, wherein the data enhancement classification model comprises a data enhancement model for enhancing the data of the target word segmentation set and a classification model for classifying an output result of the data enhancement model.
Other barrages are other barrages which do not contain the pigment, for example: front high-energy early warning and front row robbing. The second target barrage among the other barrages is a emotional barrage which does not contain the pigment, for example: really is a happy day, and is good and bad.
The word segmentation process is to divide sentences into single words, and the word segmentation set is a set formed by all words in the barrage.
The data enhancement classification model is used for enhancing the data of marked barrages and unmarked barrages to obtain marked enhanced barrages and unmarked enhanced barrages, training a supervised model through marked data and marked enhanced data, and placing the unmarked data and the unmarked enhanced barrages into the supervised model to obtain unmarked data and unmarked enhanced barrage labels. The method comprises the steps of obtaining marked data, marked enhanced data, marked non-marked data and marked non-marked enhanced barrage, selecting two groups of data from four groups of data, and then carrying out linear interpolation on the two groups of data selected to obtain excessive data. And (3) placing the excessive data and the two selected groups of data into an MLP classification model for emotion classification, wherein the KL value of the MLP is smaller than a certain value, namely outputting a classification result.
In the embodiment of the present application, if the second target barrage is a barrage with emotion tendency not having emotion among other barrages, the analysis is ended. If the second target barrage is the barrage with emotion tendency in other barrages, word segmentation processing is carried out on the second target barrage, stop words are removed according to a barrage stop word dictionary, a target word segmentation set is obtained, and then the target word segmentation set is input into a preset data enhancement classification model, so that emotion classification results of target word segmentation are obtained.
In the embodiment of the application, before the right part inputs the target word segmentation set into the preset data enhancement classification model to obtain the emotion classification result of the target word, the method further comprises the following steps:
acquiring a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample and a second unlabeled barrage sample, wherein the second labeled barrage sample is a labeled barrage sample obtained after data enhancement is performed on the first labeled barrage sample, and the second unlabeled barrage sample is an unlabeled barrage sample obtained after data enhancement is performed on the second unlabeled barrage sample;
training the initial model according to the first labeled barrage sample, the second labeled barrage sample, the first unlabeled barrage sample and the second unlabeled barrage sample to obtain a target training model;
extracting a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample with a sample prediction label and a second unlabeled barrage sample with a sample prediction label to obtain target barrage sample data, wherein at least two groups of target barrage sample data are obtained by inputting the first unlabeled barrage sample and the second unlabeled barrage sample into a target training model;
Vectorizing and characterizing the two groups of target barrage sample data, and filling the two groups of target barrage sample data by using a recessive linear difference method to obtain transition samples with different labels;
inputting a virtual sample into an initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data;
and according to the prediction result, adjusting the initial MLP neural network training model to obtain the target MLP neural network training model.
The user sets barrage texts, emotion category labeling is conducted, a first barrage sample with labels is obtained, and the remaining barrage texts are second barrage samples without labels. The first tagged enhanced sample is obtained by data enhancement of the first tagged barrage sample using back-translation. The second tagged enhanced sample is obtained by data enhancement of the second tagged barrage sample using back-translation. The back translation method can refer to a back translation method, which is a common data enhancement technology, can translate the barrage into multiple languages, and can generate multiple forms of homosemantic texts after back translation. For example: the core of the original barrage "endorse you, is to do what you consider important by himself, rather than following the main stream or environment. English is translated back as "I agree that you say, and the core is to do what you think it is meaningful, rather than follow the main stream or environment. French is back-translated as: "I agree that you say, mainly do what you think it is meaningful, rather than follow the main stream. "
Wherein the initial model may be a supervised model. Training a supervised model through the first labeled sample and the first labeled enhancement sample, and then weighting the second unlabeled sample and the second unlabeled enhancement sample through the trained supervised model to generate labels. The label feature is then enlarged using a sharpening function.
The target barrage sample is obtained by randomly extracting any two groups of four groups of data of a first labeled barrage sample, a first labeled enhanced barrage sample, a second unlabeled barrage sample for generating a label and a second unlabeled enhanced barrage sample for generating a label.
And removing the barrage containing the pigment characters from the two groups of target barrage sample data, and then removing the stop words to obtain the vectorization characterization.
In the embodiment of the present application, according to a prediction result, an initial MLP neural network training model is adjusted to obtain a target MLP neural network training model, including:
determining KL divergence of an initial MLP neural network training model according to the prediction result;
determining whether an initial MLP neural network training model converges or not according to the KL divergence;
if the initial MLP neural network training model converges, determining the initial MLP neural network training model as a target MLP neural network training model;
If the initial MLP neural network training model is not converged, adjusting parameters of the initial MLP neural network training model, and re-executing the steps of inputting a virtual sample into the initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data until the adjusted initial MLP neural network training model is converged to obtain a target MLP neural network training model.
Wherein KL divergence consists of supervised loss or consistency loss. When at least one target barrage sample belongs to the first labeled barrage sample or the first labeled enhanced barrage sample, most of the information in actual use comes from the marked data, and the model loss function is a supervised loss. The model loss function is a consistency loss when the samples are all from the second unlabeled barrage sample that generated the tag, the second unlabeled enhanced barrage sample that generated the tag.
When the emotion analysis KL divergence is smaller than the previous emotion analysis KL divergence, the parameters of the initial MLP neural network training model are integrated, and a virtual sample is input into the initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data until the adjusted initial MLP neural network training model converges to obtain a target MLP neural network training model.
Fig. 4 is a schematic flow chart of another emotion analysis method according to an embodiment of the present application. As shown in fig. 4, the method execution body may be a server or other servers, and the embodiment is not particularly limited herein, and as shown in fig. 4, the method includes:
s401, acquiring a bullet screen to be analyzed;
s402, confirming that all the bullet screens containing the pigment characters in the bullet screen to be analyzed according to the pigment character dictionary in the trained bullet screen emotion dictionary, and analyzing the pigment character parts of the bullet screens containing the pigment characters through the pigment character dictionary to obtain an emotion classification result;
s403, passing all the barrages to be analyzed, which do not contain the pigment and character, through a barrage expression symbol dictionary, a barrage popular word dictionary, a barrage degree adverb and a basic emotion dictionary of the trained barrage emotion dictionary to obtain barrages with emotion tendencies in the barrages to be analyzed. Outputting other neutral barrages;
s404, dividing the barrage with emotion tendency in the barrage to be analyzed;
s405, removing stop words in the word segmentation through a barrage stop word dictionary to obtain a characterization vectorization word segmentation set;
s406, placing the word segmentation set into a data enhancement classification model to obtain an emotion classification result of the word segmentation set;
s407, training the model before putting the word segmentation set into the data enhancement classification model for training, enhancing the unlabeled barrage by using a back-translation method to obtain an unlabeled barrage and an unlabeled enhancement barrage, enhancing the labeled barrage by using the back-translation method to obtain the labeled barrage and the labeled enhancement barrage. And (3) training a full supervision model by using the marked barrage and the marked enhanced barrage, predicting labels of the standard-free barrage and the marked enhanced barrage through the trained supervision model, weighting the prediction result to generate a final prediction label, and enabling the label results of the unlabeled barrage and the label results of the corresponding unlabeled enhanced barrage to be consistent. And amplifying the predicted unlabeled barrage label characteristics by using a sharpening function, calculating the loss of the current supervised model, and iterating the full supervised model if the loss is smaller than the last time.
S408, merging the marked barrage, the marked enhanced barrage, the unmarked barrage for generating the label and the unmarked enhanced barrage for generating the label output by the full supervision model, and selecting two groups of data from the marked barrage and the marked enhanced barrage. Filtering a barrage containing pigment characters, vectorizing and characterizing a pure text, filling two randomly extracted samples by a recessive linear difference method, producing transition samples of different labels, inputting all data pairs into an MLP neural network training model to obtain a prediction result, calculating KL divergence, and iterating the MLP neural network training model if the KL divergence is smaller than the last time.
According to the emotion analysis method provided by the embodiment of the application, under the condition of less labeling data, the generalization and the robustness of the model can be enhanced for inputting disturbance or countering the performance of a sample, so that the accuracy of the video barrage emotion analysis system is further improved.
Fig. 5 is a diagram illustrating a structure of a bullet screen dictionary constructing apparatus according to an embodiment of the present application. As shown in fig. 5, the bullet screen dictionary construction apparatus 50 includes: a first acquisition template 501, a first determination module 502, a second validation module 503, a third determination module 504, a first construction module 505, and a second construction module 506.
Wherein:
the method comprises the steps of acquiring a template 501, namely acquiring a bullet screen to be trained sample and expression text and pigment text in each bullet screen in the bullet screen to be trained sample;
the first determining module 502 performs semantic recognition on the expression text to determine semantic expression of the expression text;
a second confirmation module 503 for determining emotion tendency information of the expression text according to semantic expression of the expression text;
a third determining module 504, responsive to the emotion classification of the user for the pigment text, for determining an emotion classification of the pigment text;
the first construction module 505 constructs a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the emotion trend information of the emotion text, the pigment text and the emotion category of the pigment text;
the second construction module 506 constructs a barrage emotion analysis dictionary from the barrage emoji dictionary.
In an embodiment of the present application, the first obtaining module 501 may specifically be configured to:
acquiring a bullet screen to-be-trained sample, wherein the bullet screen to-be-trained sample comprises a plurality of bullet screens;
traversing the barrage in the barrage to-be-trained sample, and determining target characters in the barrage;
determining a target text in the barrage according to the target characters;
and distinguishing the target text according to the character composition of the target text, and determining the expression text and the pigment text in each barrage in the barrage to-be-trained sample.
In the embodiment of the present application, the first obtaining module 501 may be specifically further configured to:
extracting target keywords in a bullet screen to-be-trained sample, wherein the target keywords are the same keywords appearing in any at least two bullet screens, and the same keywords at least comprise two characters;
carrying out semantic annotation on the target keywords, and determining semantic expression of the target keywords;
determining emotion tendency information of the target keywords according to semantic expressions of the target keywords;
constructing a barrage keyword dictionary according to the target keywords and emotion tendency information of the target keywords;
and constructing a barrage emotion analysis dictionary according to the barrage keyword dictionary and the barrage emotion symbol dictionary.
In the embodiment of the present application, the first obtaining module 501 may be specifically further configured to:
responding to the user to perform emotion tendency labeling operation on the bullet screen to-be-trained sample to obtain a target bullet screen to-be-trained sample, wherein the target bullet screen to-be-trained sample comprises the bullet screen to-be-trained sample and emotion tendency labels of bullet screens in the bullet screen to-be-trained sample;
judging target stop words in a target barrage to-be-trained sample according to a preset stop word list and the target barrage to-be-trained sample, wherein the target stop words are stop words representing a trend of target emotion in the barrage;
Constructing a barrage stop word dictionary according to the target stop word;
and constructing a barrage emotion analysis dictionary according to the barrage stop word dictionary and the barrage emotion symbol dictionary.
In the embodiment of the present application, the first obtaining module 501 may be specifically further configured to:
responding to the extraction operation of a user, and extracting the degree adverbs in the bullet screen to-be-trained sample;
responding to the emotion classification labeling operation of the user on the degree adverbs, and obtaining labeling results of the degree adverbs;
constructing a degree adverb emotion dictionary according to a preset degree adverb emotion dictionary and a labeling result of the degree adverbs;
and constructing a barrage emotion analysis dictionary according to the degree adverb emotion dictionary and the barrage emotion symbol dictionary.
Fig. 6 is a diagram illustrating a structure of a barrage emotion analysis device according to an embodiment of the present application. As shown in fig. 6, the barrage emotion analysis device 60 includes: a second acquisition module 601, a fourth determination module 602, an acquisition module 603. Wherein:
the second acquisition module 601 acquires a bullet screen to be analyzed;
a fourth determining module 602, configured to perform emotion classification analysis on the bullet screen to be analyzed according to a preset emotion analysis dictionary, and determine a first target bullet screen including pigment characters in the bullet screen to be analyzed and an emotion classification represented by the first target bullet screen, where the emotion analysis dictionary is a bullet screen emotion analysis dictionary according to any one of claims 1 to 5;
The obtaining module 603 obtains an emotion analysis result according to the first target barrage and the emotion category represented by the first target barrage.
In the embodiment of the present application, the obtaining module 603 may be specifically further configured to:
according to a preset emotion analysis dictionary, performing emotion tendency analysis on other barrages, and determining second target barrages in the other barrages, wherein the other barrages are barrages of the barrages to be analyzed, which are not the first target barrages, and the second target barrages are barrages of which emotion tendencies in the other barrages are target tendencies;
performing word segmentation on the second target barrage to obtain a word segmentation set;
screening out dead words in the word segmentation set according to a preset barrage dead word dictionary to obtain a target word segmentation set;
inputting the target word segmentation set into a preset data enhancement classification model to obtain an emotion classification result of the target word, wherein the data enhancement classification model comprises a data enhancement model for enhancing the data of the target word segmentation set and a classification model for classifying an output result of the data enhancement model.
In the embodiment of the present application, the obtaining module 603 may be specifically further configured to:
acquiring a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample and a second unlabeled barrage sample, wherein the second labeled barrage sample is a labeled barrage sample obtained after data enhancement is performed on the first labeled barrage sample, and the second unlabeled barrage sample is an unlabeled barrage sample obtained after data enhancement is performed on the second unlabeled barrage sample;
Training the initial model according to the first labeled barrage sample, the second labeled barrage sample, the first unlabeled barrage sample and the second unlabeled barrage sample to obtain a target training model;
extracting a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample with a sample prediction label and a second unlabeled barrage sample with a sample prediction label to obtain target barrage sample data, wherein at least two groups of target barrage sample data are obtained by inputting the first unlabeled barrage sample and the second unlabeled barrage sample into a target training model;
vectorizing and characterizing the two groups of target barrage sample data, and filling the two groups of target barrage sample data by using a recessive linear difference method to obtain transition samples with different labels;
inputting a virtual sample into an initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data;
and according to the prediction result, adjusting the initial MLP neural network training model to obtain the target MLP neural network training model.
In the embodiment of the present application, the obtaining module 603 may be specifically further configured to:
determining KL divergence of an initial MLP neural network training model according to the prediction result;
determining whether an initial MLP neural network training model converges or not according to the KL divergence;
if the initial MLP neural network training model converges, determining the initial MLP neural network training model as a target MLP neural network training model;
if the initial MLP neural network training model is not converged, adjusting parameters of the initial MLP neural network training model, and re-executing the steps of inputting a virtual sample into the initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data until the adjusted initial MLP neural network training model is converged to obtain a target MLP neural network training model.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 70 includes:
the electronic device 70 may include one or more processing cores 'processors 701, one or more computer-readable storage media's memory 702, communication components 703, and the like. Wherein the processor 701, the memory 702 and the communication means 703 are connected by a bus 707.
In a specific implementation, at least one processor 701 executes computer-executable instructions stored in a memory 702, so that the at least one processor 701 executes the above dictionary construction method, emotion analysis method.
The specific implementation process of the processor 701 can be referred to the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
In the embodiment shown in fig. 7, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The Memory may comprise high-speed Memory (Random Access Memory, RAM) or may further comprise Non-volatile Memory (NVM), such as at least one disk Memory.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus.
In some embodiments, a computer program product is also presented, comprising a computer program or instructions which, when executed by a processor, implement the steps of any of the dictionary construction method, the emotion analysis method described above.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (13)

1. A dictionary construction method, the method comprising:
acquiring a bullet screen to-be-trained sample, wherein expression texts and pigment text in each bullet screen in the bullet screen to-be-trained sample;
carrying out semantic recognition on the expression text, and determining semantic expression of the expression text;
determining emotion tendency information of the expression text according to semantic expression of the expression text;
determining emotion classification of the pigment text in response to emotion classification of the pigment text by a user;
constructing a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion category of the pigment text;
and constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary.
2. The method of claim 1, wherein the obtaining the bullet screen to be trained sample and the expression text and the pigment text in each bullet screen in the bullet screen to be trained sample comprises:
Acquiring a bullet screen to-be-trained sample, wherein the bullet screen to-be-trained sample comprises a plurality of bullet screens;
traversing the barrage in the barrage to-be-trained sample, and determining target characters in the barrage;
determining a target text in the barrage according to the target character;
and distinguishing the target text according to the character composition of the target text, and determining the expression text and the pigment text in each barrage in the barrage to-be-trained sample.
3. The method according to claim 1, wherein the method further comprises:
extracting target keywords in the bullet screen to-be-trained sample, wherein the target keywords are the same keywords appearing in any at least two bullet screens, and the same keywords at least comprise two characters;
carrying out semantic annotation on the target keywords, and determining semantic expression of the target keywords;
determining emotion tendency information of the target keywords according to semantic expressions of the target keywords;
constructing a barrage keyword dictionary according to the target keywords and emotion tendency information of the target keywords;
and constructing a barrage emotion analysis dictionary according to the barrage keyword dictionary and the barrage emotion symbol dictionary.
4. The method according to claim 1, wherein the method further comprises:
responding to the user to perform emotion tendency labeling operation on the barrage to-be-trained sample to obtain a target barrage to-be-trained sample, wherein the target barrage to-be-trained sample comprises the barrage to-be-trained sample and emotion tendency labels of barrages in the barrage to-be-trained sample;
judging target stop words in the target barrage to-be-trained sample according to a preset stop word list and the target barrage to-be-trained sample, wherein the target stop words are stop words representing target emotion tendencies in the barrage;
constructing a barrage stop word dictionary according to the target stop word;
and constructing a barrage emotion analysis dictionary according to the barrage stop word dictionary and the barrage emotion symbol dictionary.
5. The method according to claim 1, wherein the method further comprises:
responding to the extraction operation of a user, and extracting the degree adverbs in the bullet screen to-be-trained sample;
responding to the emotion classification labeling operation of the user on the degree adverbs, and obtaining labeling results of the degree adverbs;
constructing a degree adverb emotion dictionary according to a preset degree adverb emotion dictionary and a labeling result of the degree adverbs;
And constructing a barrage emotion analysis dictionary according to the degree adverb emotion dictionary and the barrage emotion symbol dictionary.
6. A method of emotion analysis, the method comprising:
acquiring a bullet screen to be analyzed;
according to a preset emotion analysis dictionary, emotion category analysis is carried out on the bullet screen to be analyzed, a first target bullet screen containing pigment characters in the bullet screen to be analyzed and emotion categories represented by the first target bullet screen are determined, wherein the emotion analysis dictionary is a bullet screen emotion analysis dictionary according to any one of claims 1-5;
and obtaining emotion analysis results according to the first target barrage and emotion categories represented by the first target barrage.
7. The method of claim 6, wherein after performing emotion analysis on the bullet screen to be analyzed according to a preset emotion analysis dictionary, determining a first target bullet screen containing pigment words in the bullet screen to be analyzed, the method further comprises:
according to a preset emotion analysis dictionary, performing emotion tendency analysis on other barrages, and determining a second target barrage in the other barrages, wherein the other barrages are barrages which are not the first target barrages in the barrages to be analyzed, and the second target barrages are barrages with emotion tendency as target tendency in the other barrages;
Performing word segmentation on the second target barrage to obtain a word segmentation set;
screening out stop words in the word segmentation set according to a preset barrage stop word dictionary to obtain a target word segmentation set;
inputting the target word segmentation set into a preset data enhancement classification model to obtain an emotion classification result of the target word segmentation, wherein the data enhancement classification model comprises a data enhancement model for enhancing the data of the target word segmentation set and a classification model for classifying an output result of the data enhancement model.
8. The method of claim 7, wherein before the inputting the target word segmentation set into a preset data enhancement classification model to obtain an emotion classification result of the target word, the method further comprises:
acquiring a first labeled barrage sample, a second labeled barrage sample, a first unlabeled barrage sample and a second unlabeled barrage sample, wherein the second labeled barrage sample is a labeled barrage sample obtained after data enhancement is performed on the first labeled barrage sample, and the second unlabeled barrage sample is an unlabeled barrage sample obtained after data enhancement is performed on the second unlabeled barrage sample;
Training an initial model according to the first labeled barrage sample, the second labeled barrage sample, the first unlabeled barrage sample and the second unlabeled barrage sample to obtain a target training model;
extracting the first labeled barrage sample, the second labeled barrage sample, the first unlabeled barrage sample with a sample prediction label and the second unlabeled barrage sample with a sample prediction label to obtain target barrage sample data, wherein the target barrage sample data has at least two groups, and the sample prediction labels are obtained by inputting the first unlabeled barrage sample and the second unlabeled barrage sample into the target training model;
vectorizing and characterizing the two groups of target barrage sample data, and filling the two groups of target barrage sample data by using a recessive linear difference method to obtain transition samples with different labels;
inputting a virtual sample into an initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data;
And adjusting the initial MLP neural network training model according to the prediction result to obtain a target MLP neural network training model.
9. The method of claim 8, wherein adjusting the initial MLP neural network training model based on the prediction results to obtain a target MLP neural network training model comprises:
determining the KL divergence of the initial MLP neural network training model according to the prediction result;
determining whether the initial MLP neural network training model converges or not according to the KL divergence;
if the initial MLP neural network training model converges, determining the initial MLP neural network training model as a target MLP neural network training model;
and if the initial MLP neural network training model is not converged, adjusting parameters of the initial MLP neural network training model, and re-executing the steps of inputting a virtual sample into the initial MLP neural network training model to obtain a prediction result, wherein the virtual sample comprises a transition sample and two groups of target barrage sample data until the adjusted initial MLP neural network training model is converged to obtain a target MLP neural network training model.
10. A bullet screen dictionary construction apparatus, the apparatus comprising:
the method comprises the steps of obtaining a template, namely obtaining a bullet screen to-be-trained sample, and expressing text and pigment text in each bullet screen in the bullet screen to-be-trained sample;
the first determining module is used for carrying out semantic recognition on the expression text and determining semantic expression of the expression text;
the second confirmation module is used for determining emotion tendency information of the expression text according to semantic expression of the expression text;
the third determining module is used for determining the emotion classification of the pigment text in response to the emotion classification of the pigment text by the user;
the first construction module constructs a barrage emotion symbol dictionary according to the emotion tendency information of the emotion text, the pigment text and the emotion type of the pigment text;
and the second construction module is used for constructing a barrage emotion analysis dictionary according to the barrage emotion symbol dictionary.
11. A barrage emotion analysis device, the device comprising:
the second acquisition module acquires a bullet screen to be analyzed;
a fourth determining module, configured to perform emotion type analysis on the bullet screen to be analyzed according to a preset emotion analysis dictionary, and determine a first target bullet screen containing pigment characters in the bullet screen to be analyzed and emotion types represented by the first target bullet screen, where the emotion analysis dictionary is a bullet screen emotion analysis dictionary according to any one of claims 1 to 5;
And the obtaining module is used for obtaining emotion analysis results according to the first target barrage and emotion categories represented by the first target barrage.
12. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 9.
13. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 9.
CN202310907870.6A 2023-07-21 2023-07-21 Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium Pending CN116911286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310907870.6A CN116911286A (en) 2023-07-21 2023-07-21 Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310907870.6A CN116911286A (en) 2023-07-21 2023-07-21 Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116911286A true CN116911286A (en) 2023-10-20

Family

ID=88352785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310907870.6A Pending CN116911286A (en) 2023-07-21 2023-07-21 Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116911286A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117217218B (en) * 2023-11-08 2024-01-23 中国科学技术信息研究所 Emotion dictionary construction method and device for science and technology risk event related public opinion

Similar Documents

Publication Publication Date Title
CN112270196B (en) Entity relationship identification method and device and electronic equipment
Shrivastava et al. An effective approach for emotion detection in multimedia text data using sequence based convolutional neural network
CN106096004B (en) A method of establishing extensive cross-domain texts emotional orientation analysis frame
CN108197109A (en) A kind of multilingual analysis method and device based on natural language processing
Zhang et al. Multi-modal meta multi-task learning for social media rumor detection
CN109376251A (en) A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
CN107229610A (en) The analysis method and device of a kind of affection data
Chen et al. The use of deep learning distributed representations in the identification of abusive text
CN108228758A (en) A kind of file classification method and device
Peng et al. Human–machine dialogue modelling with the fusion of word-and sentence-level emotions
Qiu et al. Advanced sentiment classification of tibetan microblogs on smart campuses based on multi-feature fusion
Sheshikala et al. Natural language processing and machine learning classifier used for detecting the author of the sentence
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
Thakur et al. A review on text based emotion recognition system
CN116911286A (en) Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium
AlAjlan et al. Machine learning approach for threat detection on social media posts containing Arabic text
Rehman et al. User-aware multilingual abusive content detection in social media
CN111815426B (en) Data processing method and terminal related to financial investment and research
Trisal et al. K-RCC: A novel approach to reduce the computational complexity of KNN algorithm for detecting human behavior on social networks
Hu et al. Emotion prediction oriented method with multiple supervisions for emotion-cause pair extraction
CN111681731A (en) Method for automatically marking colors of inspection report
Chen et al. Learning the chinese sentence representation with LSTM autoencoder
Al Mahmud et al. A New Approach to Analysis of Public Sentiment on Padma Bridge in Bangla Text
Gashroo et al. Hitacod: Hierarchical framework for textual abusive content detection
EP3956781A1 (en) Irrelevancy filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination