CN113886580A - Emotion scoring method and device and electronic equipment - Google Patents
Emotion scoring method and device and electronic equipment Download PDFInfo
- Publication number
- CN113886580A CN113886580A CN202111126984.4A CN202111126984A CN113886580A CN 113886580 A CN113886580 A CN 113886580A CN 202111126984 A CN202111126984 A CN 202111126984A CN 113886580 A CN113886580 A CN 113886580A
- Authority
- CN
- China
- Prior art keywords
- emotion
- text data
- language model
- scoring
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The application discloses an emotion scoring method, an emotion scoring device and electronic equipment, wherein the method comprises the following steps: carrying out emotion category labeling on the first text data to obtain second text data containing emotion category identification; training the first language model by using second text data to obtain a second language model for emotion scoring; and carrying out emotion grading on the acquired text data based on the second language model to generate an emotion value corresponding to the text data. In the process of carrying out emotion grading on the acquired text data by the second language model obtained based on the method, emotion category marking on the text data is not needed manually, so that the labor cost and the time cost can be saved, the problem that only marked sample data is suitable can be avoided, and the use scene is diversified.
Description
Technical Field
The present application relates to the field of natural language processing, and in particular, to an emotion scoring method and apparatus, and an electronic device.
Background
Along with the rise of the popularity of the internet and the continuous expansion of the scale of the netizens, the internet has become an important means for people to obtain information, and people can express their own opinion view while obtaining information. In these opinion opinions, information with user emotion and personal subjective opinion is usually contained, and the information is mainly expressed in a network platform in a text form.
In a large amount of network information, the text information with user emotion and personal subjective opinion is mined and analyzed to judge positive emotion and negative emotion, and quantitative emotion values have important significance, for example, a merchant attracts other potential users by sharing the evaluation of a product user on a shopping website. The manufacturer can also make the next production plan by analyzing the evaluation information of the user, thereby achieving the purpose of increasing the product sales volume. In addition, by analyzing and mining the information of the hot events, some bad opinions in the network can be found, so that public opinions can be guided to develop towards the positive direction in time, and the occurrence of events which can excite social contradictions can be prevented.
In order to excavate text information with user emotion and personal subjective opinions in a large amount of network information and conduct emotion scoring on the text information, emotion categories of the excavated text information generally need to be labeled, the process of carrying out emotion category labeling on the acquired text information at the present stage is mainly completed manually, a large amount of time cost and labor cost are consumed, meanwhile, the method for carrying out emotion scoring on the acquired text information at the present stage is only applicable to sample data which is already labeled, and the use scene is single.
Disclosure of Invention
The application provides an emotion scoring method, an emotion scoring device and electronic equipment, wherein a first language model is trained, and in the process that an acquired second language model conducts emotion scoring on acquired text data, emotion category marking on the text data is not needed manually, so that the labor cost and the time cost can be saved, meanwhile, the problem that only labeled sample data is suitable can be avoided, and the use scene is diversified.
In a first aspect, the present application provides an emotion scoring method, including:
carrying out emotion category labeling on the first text data to obtain second text data containing emotion category identification;
training the first language model by using the second text data to obtain a second language model for emotion scoring;
and carrying out emotion scoring on the acquired text data based on the second language model to generate an emotion value corresponding to the text data.
By the aid of the method, after emotion scoring training is carried out on the first language model, the obtained second language model can carry out emotion scoring processing on the text data without emotion type labels.
In a possible design, before the emotion category labeling is performed on the first text data, and the second text data containing emotion category identification is obtained, the method further includes:
acquiring third text data;
and performing sentence segmentation processing, word segmentation processing and stop word processing on the third text data respectively to obtain the first text data.
By the method, the acquired third text data is processed, and the acquired first text data is used for emotion category marking and emotion scoring, so that the emotion value of the comment data can be more accurately evaluated.
Further, the emotion category labeling of the first text data to obtain second text data including emotion category identification includes:
analyzing and processing the first text data to obtain an emotion word set comprising a plurality of emotion expression words;
and according to the emotion word set, performing emotion category labeling on the first text data to obtain second text data containing the emotion category identification.
By the aid of the method, emotion category marking is carried out on the first text data, manual marking is avoided, and time cost and labor cost are saved.
Further, performing emotion word analysis processing on the first text data to obtain an emotion word set including a plurality of emotion expression words, including:
filtering and weighting the non-emotion expression words in the first text data to obtain a text vocabulary set;
and acquiring near-meaning words of the emotion expression words in the text vocabulary set, and adding the near-meaning words into the text vocabulary set to obtain the emotion word set.
By the method, the first text data is filtered and expanded to obtain the emotion word set for emotion category labeling.
Further, according to the emotion word set, performing emotion category labeling on the first text data to obtain the second text data including the emotion category identifier, including:
extracting all emotion expression words in the first text data according to the emotion word set;
respectively calculating a first emotion value corresponding to each emotion expression word;
calculating a second emotion value corresponding to the first text data according to each first emotion value;
determining the emotion type corresponding to the second emotion value;
and carrying out emotion category marking on the first text data to obtain the second text data containing the emotion category identification.
By the aid of the method, emotion category marking is carried out on the first text data, manual marking is avoided, and time cost and labor cost are saved.
Further, training the first language model with the second text data to obtain a second language model for emotion scoring, comprising:
according to the second text data, performing semantic analysis training on the first language model to obtain a third language model;
and performing emotion scoring training on the third language model according to the second text data to obtain the second language model for emotion scoring.
By the method, before emotion analysis training is carried out on the first language model, semantic analysis training is carried out, so that the finally obtained second language model is more suitable for text data of comment types, and has stronger semantic analysis capability, and word ambiguity is avoided.
Further, according to the second text data, performing emotion scoring training on the third language model to obtain the second language model for emotion scoring, including:
selecting M groups of training samples in the second text data, wherein each group of training samples comprises K emotion expression words, and M and K are positive integers greater than or equal to 1;
sequentially inputting each training sample set into the third language model, and respectively calculating to obtain semantic features corresponding to each training sample;
respectively calculating the emotion value of each training sample according to the semantic features in sequence;
sequentially judging whether the emotion value is larger than a preset threshold value or not;
if so, ending the emotion scoring training, otherwise, continuing to perform emotion scoring training on the third language model.
By the aid of the method, emotion scoring training is performed on the third language model, so that the second language model obtained after training can perform emotion scoring on the text data without emotion type labels.
In a second aspect, the present application provides an emotion scoring apparatus, comprising:
the marking module is used for marking the emotion types of the first text data to obtain second text data containing emotion type identifications;
the training module is used for training the first language model by using the second text data to obtain a second language model for emotion scoring;
and the scoring module is used for carrying out emotion scoring on the acquired text data based on the second language model and generating an emotion value corresponding to the text data.
In one possible design, the apparatus further includes:
the acquisition module is used for acquiring third text data;
and the processing module is used for performing sentence segmentation processing, word segmentation processing and stop word processing on the third text data respectively to obtain the first text data.
Further, the labeling module is specifically configured to:
analyzing and processing the first text data to obtain an emotion word set comprising a plurality of emotion expression words;
and according to the emotion word set, performing emotion category labeling on the first text data to obtain second text data containing the emotion category identification.
Further, the labeling module is further configured to:
filtering and weighting the non-emotion expression words in the first text data to obtain a text vocabulary set;
and acquiring near-meaning words of the emotion expression words in the text vocabulary set, and adding the near-meaning words into the text vocabulary set to obtain the emotion word set.
Further, the labeling module is further configured to:
extracting all emotion expression words in the first text data according to the emotion word set;
respectively calculating a first emotion value corresponding to each emotion expression word;
calculating a second emotion value corresponding to the first text data according to each first emotion value;
determining the emotion type corresponding to the second emotion value;
and carrying out emotion category marking on the first text data to obtain the second text data containing the emotion category identification.
Further, the training module is specifically configured to:
according to the second text data, performing semantic analysis training on the first language model to obtain a third language model;
and performing emotion scoring training on the third language model according to the second text data to obtain the second language model for emotion scoring.
Further, the training module is further configured to:
selecting M groups of training samples in the second text data, wherein each group of training samples comprises K emotion expression words, and M and K are positive integers greater than or equal to 1;
sequentially inputting each training sample set into the third language model, and respectively calculating to obtain semantic features corresponding to each training sample;
respectively calculating the emotion value of each training sample according to the semantic features in sequence;
sequentially judging whether the emotion value is larger than a preset threshold value or not;
if so, ending the emotion scoring training, otherwise, continuing to perform emotion scoring training on the third language model.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the emotion scoring method when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the emotion scoring method steps described above.
Based on the emotion scoring method, firstly, semantic analysis training is carried out on the first language model, and then emotion scoring training is carried out, so that the finally generated second language model has strong semantic analysis capability and emotion scoring accuracy. Meanwhile, in the process of carrying out emotion grading on the acquired text data by using the language model, emotion category marking on the text data is not needed manually, so that the labor cost and the time cost can be saved, the problem that only marked sample data is suitable can be avoided, and the use scene is diversified.
For each of the second to fourth aspects and possible technical effects of each aspect, reference is made to the above description of the possible technical effects of the first aspect or various possible schemes of the first aspect, and repeated description is omitted here.
Drawings
FIG. 1 is a flow chart of an emotion scoring method provided herein;
FIG. 2 is a schematic diagram illustrating the effect of annotating comment data provided by the present application;
FIG. 3 is a flowchart of an emotion scoring training method provided by the present application;
FIG. 4 is a schematic structural diagram of an emotion scoring apparatus provided in the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In a large amount of network information, text information with user emotion and personal subjective opinions is mined and subjected to emotion grading, emotion categories of the mined text information generally need to be labeled, the process of labeling the emotion categories of the acquired text information at the present stage is mainly completed manually, a large amount of time cost and labor cost are consumed, meanwhile, the method for performing emotion grading on the acquired text information at the present stage is only applicable to sample data which is already labeled, and the use scene is single.
In order to solve the problems, the application provides an emotion scoring method, after emotion scoring training is carried out on a first language model, an obtained second language model can carry out emotion scoring processing on text data without emotion category labels, the emotion scoring method can reduce time cost and labor cost, meanwhile, the problem that only labeled sample data is suitable can be avoided, and using scenes are diversified. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.
As shown in fig. 1, a flowchart of an emotion scoring method provided by the present application specifically includes the following steps:
s11, labeling emotion types of the first text data to obtain second text data containing emotion type identifications;
in the embodiment of the present application, before performing emotion type tagging on first text data, the first text data needs to be acquired, and a specific method for acquiring the first text data is as follows:
firstly, comment text data are obtained, and two methods are available for obtaining the text data, wherein one method is to access the comment text data from a database, and the other method is to obtain the text data through a crawler technology;
then, the acquired comment text data is processed, and the method mainly comprises three processing modes:
the first processing mode is sentence division processing, which means that comment text data to be acquired is divided according to a division symbol, wherein the division symbol includes a period and a semicolon. Taking the online classroom teaching comment data as an example, all comment data are taken as a whole before clauses of the obtained comment data:
{ teachers have earnest attitude in class and have sufficient knowledge reserves, and the knowledge range of students is widened. Secondly, teachers use teaching methods such as ppt, videos and please for sharing among the students in the courses to help the teachers to learn. }.
After sentence splitting, the comment data is decomposed into a plurality of sentences:
{ teachers have earnest attitude in class and have sufficient knowledge reserves, and the knowledge range of students is widened. };
{ second, teachers help us learn during the course using teaching methods such as ppt, video, please share with officers, etc. };
in the embodiment of the application, sentence division processing is performed on the acquired text data, so that the emotion value of the comment data is more accurately evaluated.
The second method is word segmentation, which is a process of recombining continuous word sequences into word sequences according to a certain specification, and the specific processing method can be completed by using an open source tool and a word bank. Taking the online classroom teaching comment data as an example, the obtained comment data is before word segmentation, and all word sequences are continuous:
{ skilled skill is skillful, has infectivity and clear explanation. The multimedia and blackboard writing are reasonable in design and good in audio-visual effect. };
after word segmentation, the words are disconnected:
{ skilled skill is skillful, has infectivity and clear explanation. The multimedia and blackboard writing are reasonable in design and good in audio-visual effect. }.
In a Chinese sentence, there is no space between words, and a single word is difficult to represent a specific meaning, so that the sentence needs to be participled to help the language model to understand.
The third processing mode is the processing of stop words, and the stop words are characters, words and symbols which have no practical significance and appear in a large quantity in the text. The word-stopping processing is to filter the words, words and symbols without practical meaning, and the word-stopping method is mainly completed according to an open-source stopping word bank. Taking classroom teaching comment data on the internet as an example, the comment data is as follows before stop words:
{ skilled skill is skillful, has infectivity and clear explanation. The multimedia and blackboard writing are reasonable in design and good in audio-visual effect. };
after the stop word:
{ the skill is skillful, has clear multimedia and reasonable blackboard-writing design and has good audio-visual effect with infectious capacity clarification thought }.
Through the three processing modes, the acquired comment text data is processed, and the first text data can be obtained.
Next, emotion category labeling needs to be performed on the first text data, that is, according to the emotion category of the first text data, an identifier corresponding to the emotion category is bound to the first text data. The emotion categories comprise positive emotion, negative emotion and neutral emotion, wherein the positive emotion can correspond to the identifier of 1, the negative emotion can correspond to the identifier of-1, and the neutral emotion can correspond to the identifier of 0.
Taking classroom teaching comment data on the internet as an example, referring to fig. 2, if the first text data is that the audio-visual effect of the multimedia blackboard writing is reasonable in design, the text data is positive emotion, and therefore the corresponding identifier is 1; if the first text data is 'teaching target is completed by the general middle rule of teaching', the text data is neutral emotion, and therefore the corresponding mark is 0; if the first text data is "daily lesson mode ppt lecture classmate interaction is slightly less", the text data is negative emotion, and therefore the corresponding mark is-1.
The emotion category marking of the first text data is mainly completed by the following method:
firstly, emotion word analysis processing is carried out on first text data to obtain an emotion word set containing a plurality of emotion expression words, and the method specifically comprises the following steps: firstly, filtering and weighting non-emotion expression words in first text data to obtain a text vocabulary set; and then obtaining the similar meaning words of the emotion expression words in the text vocabulary set, and adding the similar meaning words into the text vocabulary set to obtain an emotion word set.
Taking the online classroom teaching comment data as an example, the first text data is as follows:
{ skilled man.
After filtering and weighting the non-emotion expression words in the first text data, the obtained text vocabulary collection is as follows:
{ 'skillful': 2 ',' reasonable ': 1', 'good': 1.
In the embodiment of the application, the filtering and weighting of the non-emotion expression words are completed according to an industry emotion dictionary, wherein the industry emotion dictionary refers to a set of emotion expression words of a specific industry and emotion weights of corresponding emotion expression words, and is divided into a positive emotion dictionary, a negative emotion dictionary and a degree adverb dictionary according to the part of speech. The specific format of the industry emotion dictionary is as follows: { emotion word 1: weight 1, emotion word 2: weight 2,. and emotion word n: weight n, }.
In the obtained text vocabulary set, the number of emotion expression words is small, and near word expansion is needed to be performed on the words, in the embodiment of the application, a word2vec model is adopted to complete the method, and the specific method is as follows: and acquiring the most similar n words of each word by using all emotion expression words in the text vocabulary set through a word2vec model, wherein n is a similarity ranking, can be dynamically adjusted, and gives the weight of the original vocabulary to the acquired similar words.
Taking classroom teaching comment data on the internet as an example, the example after the word collection of the text is expanded by the similar meaning word is as follows: { "skillful": 2, "skilled": 2, "familiar": 2., "sophisticated": 2.,.
And when the text vocabulary is expanded, obtaining the emotion vocabulary.
Further, according to the emotion word set, emotion type labeling is carried out on the first text data, and the specific method comprises the following steps:
extracting all emotion expression words in the first text data according to the emotion word set;
then, respectively calculating a first emotion value corresponding to each emotion expression word, wherein the specific calculation formula is as follows:
Owi=Mwa×Swi (1)
in formula (1), wi represents the ith emotion expression, wherein i is a positive integer greater than or equal to 1, and QwiExpressing a first emotion value corresponding to the ith emotion expression word, SwiRepresenting the weight corresponding to the word wi for emotion, representing degree adverb by wa, MwaIndicating a degree adverb or exclamation point'! "corresponding weight value;
when a negative word modifying emotion expression wi appears, in order to achieve the inversion of the emotion polarity, a first emotion value calculation formula corresponding to the emotion expression is as follows:
Owi=Mwb×Swi (2)
in the formula (2), MwaRepresenting the weight of the negative word.
Further, according to each first emotion value, calculating a second emotion value corresponding to the first text data, wherein the specific calculation method is as follows:
firstly, respectively calculating a third emotion value corresponding to each sentence according to the first emotion value corresponding to each emotion word, wherein the specific calculation formula is as follows:
in the formula (3), OsmRepresenting a third emotion value corresponding to the mth sentence, and h representing the number of emotion representing words contained in the mth sentence, wherein both m and h are positive integers greater than or equal to 1;
then, according to the third emotion value corresponding to each sentence, calculating a second emotion value corresponding to the first text data, wherein the specific calculation formula is as follows:
in the formula (4), OdjAnd the second emotion value corresponding to the first text data is represented.
Further, determining the emotion type corresponding to each second emotion value;
in this embodiment of the application, the emotion classification corresponding to each second emotion value may be determined by comparing the second emotion value with a preset parameter, and the specific method may be:
if the second emotion value is greater than the preset parameter, the second emotion value corresponds to the positive emotion;
if the second emotion value is equal to the preset parameter, the second emotion value corresponds to a neutral emotion;
and if the second emotion value is smaller than the preset parameter, the second emotion value corresponds to the negative emotion.
In the above process, the preset parameter may be set to 0, or may be set according to an actual situation, and is not specifically limited herein.
Furthermore, emotion category labeling is carried out on the first text data, and N second text data containing emotion category identifications are obtained.
S12, training the first language model by using second text data to obtain a second language model for emotion scoring;
in the embodiment of the present application, the first language model may be a BERT (binary Encoder Representation from transforms) language model, and the BERT language model is pre-trained in a large amount of data sets and has a strong semantic analysis capability.
In order to enable the BERT language model to be more suitable for text data of comment types, the method comprises the steps of firstly, carrying out semantic analysis training on the BERT language model according to an original pre-training method by using second text data to obtain a third language model, wherein the third language model has stronger semantic analysis capability aiming at the text data of the comment types;
and then, carrying out emotion scoring training on the third language model to obtain a second language model.
As shown in fig. 3, a flowchart of a method for emotion scoring training for a third language model includes the following steps:
s31, selecting M groups of training samples in the second text data, wherein each group of training samples comprises K emotion expression words;
in the embodiment of the present application, M and K are both positive integers greater than or equal to 1, and each training sample is an L × E-order matrix, as shown in formula (5):
in the formula (5), arRepresents the r-th training sample, r is a positive integer which is greater than or equal to 1 and less than or equal to M, and arEach element represents an emotion expression word, L represents the length of a training sample, E represents a word vector dimension, and L · E ═ K.
S32, sequentially inputting each training sample set into a third language model, and respectively calculating to obtain semantic features corresponding to each training sample;
in the embodiment of the present application, after each training sample is input into the third language model, the semantic vector corresponding to the training sample is obtained through calculation, as shown in formula (6):
Sem=(cls,token1,token2,…,tokenL,sep) (6)
in formula (6), Sem represents a semantic vector of the traffic sample, L is a positive integer greater than or equal to 1, and cls ═ x1,x2,,…,xE) Representing the semantics of a training sample, tokent=(p1,p2,,…,pE) Representing the semantic meaning of each word in the training sample, wherein t is a positive integer which is greater than or equal to 1 and less than or equal to L.
Then, performing full-connection layer calculation on the semantics of the training samples in the semantic vector to obtain semantic features corresponding to the training samples, wherein the collective calculation formula is as follows:
z=vT·cls+b (7)
in equation (7), z represents the semantic feature matrix of the training sample, vTAnd b represents parameters of a full connection layer, and can be adjusted according to different sample emotion types.
S33, respectively calculating the emotion value corresponding to each training sample according to the semantic features;
in the embodiment of the application, all elements in the semantic feature matrix are added, and then the calculated result is normalized, so that the emotion value corresponding to the training sample can be obtained, and the emotion value can be mapped to the interval [0, 100 ].
S34, sequentially judging whether the emotion value is greater than a preset threshold value;
in the embodiment of the present application, after each training sample completes training of the third language model, the generated emotion value is compared with the preset threshold, if the emotion value is greater than the preset threshold, step S35 is executed, otherwise, step S32 is continuously executed.
S35, if yes, ending the emotion scoring training;
by the method, the emotion scoring can be performed on the text data without emotion type labels by the second language model obtained after emotion scoring training is performed on the third language model.
And S13, carrying out emotion scoring on the acquired text data based on the second language model, and generating emotion values corresponding to the text data.
In the embodiment of the present application, performing emotion scoring on the obtained text data to generate an emotion value corresponding to the text data, including:
acquiring text data;
sentence dividing processing, word dividing processing and stop word processing are carried out on the text data;
and inputting the processed text data into a second language model to generate an emotion value corresponding to the text data.
The method and principle for processing the acquired text data are the same as those for processing the acquired first text data, and are not repeated herein.
According to the emotion scoring method provided by the application, firstly, semantic analysis training is carried out on the first language model, then emotion scoring training is carried out, so that the finally generated second language model has strong semantic analysis capability, and the emotion scoring accuracy is high. Meanwhile, in the process of carrying out emotion grading on the acquired text data by using the language model, emotion category marking on the text data is not needed manually, so that the labor cost and the time cost can be saved, the problem that only marked sample data is suitable can be avoided, and the use scene is diversified.
Based on the same inventive concept, an emotion scoring device is further provided in the embodiments of the present application, as shown in fig. 4, which is a schematic structural diagram of an emotion scoring device in the present application, and the device includes:
the labeling module 41 is configured to label emotion categories of the first text data to obtain second text data including emotion category identifiers;
a training module 42, configured to train the first language model with the second text data to obtain a second language model for emotion scoring;
and a scoring module 43, configured to perform emotion scoring on the obtained text data based on the second language model, and generate an emotion value corresponding to the text data.
In one possible design, the apparatus further includes:
the acquisition module is used for acquiring third text data;
and the processing module is used for performing sentence segmentation processing, word segmentation processing and stop word processing on the third text data respectively to obtain the first text data.
Further, the labeling module 41 is specifically configured to:
analyzing and processing the first text data to obtain an emotion word set comprising a plurality of emotion expression words;
and according to the emotion word set, performing emotion category labeling on the first text data to obtain second text data containing the emotion category identification.
Further, the labeling module 41 is further configured to:
filtering and weighting the non-emotion expression words in the first text data to obtain a text vocabulary set;
and acquiring near-meaning words of the emotion expression words in the text vocabulary set, and adding the near-meaning words into the text vocabulary set to obtain the emotion word set.
Further, the labeling module 41 is further configured to:
extracting all emotion expression words in the first text data according to the emotion word set;
respectively calculating a first emotion value corresponding to each emotion expression word;
calculating a second emotion value corresponding to the first text data according to each first emotion value;
determining the emotion type corresponding to the second emotion value;
and carrying out emotion category marking on the first text data to obtain the second text data containing the emotion category identification.
Further, the training module 42 is specifically configured to:
according to the second text data, performing semantic analysis training on the first language model to obtain a third language model;
and performing emotion scoring training on the third language model according to the second text data to obtain the second language model for emotion scoring.
Further, the training module 42 is further configured to:
selecting M groups of training samples in the second text data, wherein each group of training samples comprises K emotion expression words, and M and K are positive integers greater than or equal to 1;
sequentially inputting each training sample set into the third language model, and respectively calculating to obtain semantic features corresponding to each training sample;
respectively calculating the emotion value of each training sample according to the semantic features in sequence;
sequentially judging whether the emotion value is larger than a preset threshold value or not;
if so, ending the emotion scoring training, otherwise, continuing to perform emotion scoring training on the third language model.
Based on the emotion scoring device, firstly, semantic analysis training is carried out on the first language model, then emotion scoring training is carried out, so that the finally generated second language model has strong semantic analysis capability, and the emotion scoring accuracy is high. Meanwhile, in the process of carrying out emotion grading on the acquired text data by using the language model, emotion category marking on the text data is not needed manually, so that the labor cost and the time cost can be saved, the problem that only marked sample data is suitable can be avoided, and the use scene is diversified.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing emotion scoring apparatus, and with reference to fig. 5, the electronic device includes:
at least one processor 51, and a memory 52 connected to the at least one processor 51, in this embodiment, a specific connection medium between the processor 51 and the memory 52 is not limited, and fig. 5 illustrates an example in which the processor 51 and the memory 52 are connected through a bus 50. The bus 50 is shown in fig. 5 by a thick line, and the connection between other components is merely illustrative and not intended to be limiting. The bus 50 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 51 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 52 stores instructions executable by the at least one processor 51, and the at least one processor 51 can execute the emotion scoring method discussed above by executing the instructions stored in the memory 52. The processor 51 may implement the functions of the various modules in the apparatus shown in fig. 4.
The processor 51 is a control center of the apparatus, and may be connected to various parts of the entire control device by various interfaces and lines, and perform various functions of the apparatus and process data by executing or executing instructions stored in the memory 52 and calling data stored in the memory 52, thereby performing overall monitoring of the apparatus.
In one possible design, processor 51 may include one or more processing units, and processor 51 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 51. In some embodiments, the processor 51 and the memory 52 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 51 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the emotion scoring method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
The memory 52, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 52 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. The memory 52 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 52 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
The processor 51 is programmed to solidify the codes corresponding to the emotion scoring method described in the foregoing embodiment into a chip, so that the chip can execute the steps of the emotion scoring method of the embodiment shown in fig. 1 when running. How to program the processor 51 is well known to those skilled in the art and will not be described in detail here.
Based on the same inventive concept, the present application also provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the emotion scoring method discussed above.
In some possible embodiments, the aspects of the emotion scoring method provided by the present application may also be implemented in the form of a program product including program code for causing the control apparatus to perform the steps of the emotion scoring method according to various exemplary embodiments of the present application described above in this specification when the program product is run on a device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (10)
1. A sentiment scoring method, characterized in that the method comprises:
carrying out emotion category labeling on the first text data to obtain second text data containing emotion category identification;
training the first language model by using the second text data to obtain a second language model for emotion scoring;
and carrying out emotion scoring on the acquired text data based on the second language model to generate an emotion value corresponding to the text data.
2. The method of claim 1, wherein before said emotion class labeling of the first text data, obtaining the second text data containing emotion class identifiers, further comprising:
acquiring third text data;
and performing sentence segmentation processing, word segmentation processing and stop word processing on the third text data respectively to obtain the first text data.
3. The method of claim 1, wherein said annotating the emotion category of the first text data to obtain the second text data comprising an emotion category identification comprises:
analyzing and processing the first text data to obtain an emotion word set comprising a plurality of emotion expression words;
and according to the emotion word set, performing emotion category labeling on the first text data to obtain second text data containing the emotion category identification.
4. The method of claim 3, wherein performing emotion word analysis processing on the first text data to obtain an emotion word set including a plurality of emotion expression words comprises:
filtering and weighting the non-emotion expression words in the first text data to obtain a text vocabulary set;
and acquiring near-meaning words of the emotion expression words in the text vocabulary set, and adding the near-meaning words into the text vocabulary set to obtain the emotion word set.
5. The method of claim 3, wherein performing emotion category labeling on the first text data according to the emotion word set to obtain the second text data containing the emotion category identifier comprises:
extracting all emotion expression words in the first text data according to the emotion word set;
respectively calculating a first emotion value corresponding to each emotion expression word;
calculating a second emotion value corresponding to the first text data according to each first emotion value;
determining the emotion type corresponding to the second emotion value;
and carrying out emotion category marking on the first text data to obtain the second text data containing the emotion category identification.
6. The method of claim 1, wherein training the first language model with the second textual data to obtain a second language model for emotion scoring comprises:
according to the second text data, performing semantic analysis training on the first language model to obtain a third language model;
and performing emotion scoring training on the third language model according to the second text data to obtain the second language model for emotion scoring.
7. The method of claim 6, wherein performing emotion scoring training on the third language model based on the second text data to obtain the second language model for emotion scoring, comprises:
selecting M groups of training samples in the second text data, wherein each group of training samples comprises K emotion expression words, and M and K are positive integers greater than or equal to 1;
sequentially inputting each training sample set into the third language model, and respectively calculating to obtain semantic features corresponding to each training sample;
respectively calculating the emotion value of each training sample according to the semantic features in sequence;
sequentially judging whether the emotion value is larger than a preset threshold value or not;
if so, ending the emotion scoring training, otherwise, continuing to perform emotion scoring training on the third language model.
8. An emotion scoring apparatus, characterized in that the apparatus comprises:
the marking module is used for marking the emotion types of the first text data to obtain second text data containing emotion type identifications;
the training module is used for training the first language model by using the second text data to obtain a second language model for emotion scoring;
and the scoring module is used for carrying out emotion scoring on the acquired text data based on the second language model and generating an emotion value corresponding to the text data.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-7 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111126984.4A CN113886580A (en) | 2021-09-26 | 2021-09-26 | Emotion scoring method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111126984.4A CN113886580A (en) | 2021-09-26 | 2021-09-26 | Emotion scoring method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113886580A true CN113886580A (en) | 2022-01-04 |
Family
ID=79006585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111126984.4A Pending CN113886580A (en) | 2021-09-26 | 2021-09-26 | Emotion scoring method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113886580A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114757489A (en) * | 2022-03-18 | 2022-07-15 | 国网电子商务有限公司 | Business index generation method and device, electronic equipment and storage medium |
CN116108859A (en) * | 2023-03-17 | 2023-05-12 | 美云智数科技有限公司 | Emotional tendency determination, sample construction and model training methods, devices and equipment |
-
2021
- 2021-09-26 CN CN202111126984.4A patent/CN113886580A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114757489A (en) * | 2022-03-18 | 2022-07-15 | 国网电子商务有限公司 | Business index generation method and device, electronic equipment and storage medium |
CN116108859A (en) * | 2023-03-17 | 2023-05-12 | 美云智数科技有限公司 | Emotional tendency determination, sample construction and model training methods, devices and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN110825867B (en) | Similar text recommendation method and device, electronic equipment and storage medium | |
CN112100388A (en) | Method for analyzing emotional polarity of long text news public sentiment | |
CN109522412B (en) | Text emotion analysis method, device and medium | |
CN111930792B (en) | Labeling method and device for data resources, storage medium and electronic equipment | |
CN113886580A (en) | Emotion scoring method and device and electronic equipment | |
CN115392237B (en) | Emotion analysis model training method, device, equipment and storage medium | |
CN111897955B (en) | Comment generation method, device, equipment and storage medium based on encoding and decoding | |
CN108009248A (en) | A kind of data classification method and system | |
CN110263148A (en) | Intelligent resume selection method and device | |
Cobos et al. | Moods in MOOCs: Analyzing emotions in the content of online courses with edX-CAS | |
CN108090098A (en) | A kind of text handling method and device | |
CN117171350A (en) | Knowledge graph-based personalized course learning environment construction method and device | |
Kortum et al. | Dissection of AI job advertisements: A text mining-based analysis of employee skills in the disciplines computer vision and natural language processing | |
CN117077679B (en) | Named entity recognition method and device | |
CN113254814A (en) | Network course video labeling method and device, electronic equipment and medium | |
CN116628162A (en) | Semantic question-answering method, device, equipment and storage medium | |
CN112328812B (en) | Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment | |
CN115292489A (en) | Enterprise public opinion analysis method, device, equipment and storage medium | |
CN114862141A (en) | Method, device and equipment for recommending courses based on portrait relevance and storage medium | |
Alrajhi et al. | Plug & Play with Deep Neural Networks: Classifying Posts that Need Urgent Intervention in MOOCs | |
CN114021004A (en) | Method, device and equipment for recommending science similar questions and readable storage medium | |
CN112347150A (en) | Method and device for labeling academic label of student and electronic equipment | |
CN111563162A (en) | MOOC comment analysis system and method based on text emotion analysis | |
CN112465227A (en) | Teaching data acquisition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |