CN112036175A - Domain text emotion recognition method and device, computer equipment and storage medium - Google Patents

Domain text emotion recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112036175A
CN112036175A CN202010694597.XA CN202010694597A CN112036175A CN 112036175 A CN112036175 A CN 112036175A CN 202010694597 A CN202010694597 A CN 202010694597A CN 112036175 A CN112036175 A CN 112036175A
Authority
CN
China
Prior art keywords
emotion
word
clause
preset
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010694597.XA
Other languages
Chinese (zh)
Inventor
沈春泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Financial Technology Nanjing Co Ltd
Original Assignee
Suning Financial Technology Nanjing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Financial Technology Nanjing Co Ltd filed Critical Suning Financial Technology Nanjing Co Ltd
Priority to CN202010694597.XA priority Critical patent/CN112036175A/en
Publication of CN112036175A publication Critical patent/CN112036175A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for recognizing emotion of a field text, a computer device and a storage medium, belonging to the technical field of text processing, wherein the method comprises the following steps: performing sentence division and word division on the field text to be recognized; matching the participles of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the clause where each emotion word is located; calculating emotion intensity values of the field text and normalizing the emotion intensity values based on preset emotion intensity of each emotion word, the clause where each emotion word is located and preset position weights of the clauses at different positions of the field text; and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text. The method and the device can effectively improve the accuracy of text emotion recognition in a specific field.

Description

Domain text emotion recognition method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of text information processing, in particular to a method and a device for recognizing emotion of a field text, computer equipment and a storage medium.
Background
The domain text emotion recognition is a process of analyzing, processing and extracting subjective domain texts with emotional colors by using natural language processing and domain text mining technologies, so as to recognize whether subjective trends of the subjective domain texts are positive or negative, or positive or negative.
At present, the domain text emotion recognition technology is widely used in many fields of natural language processing, and a typical scene is a public opinion monitoring system. For example, in the financial field, business personnel usually use public opinion information as an important source of attention, and particularly, a large amount of negative information, such as default, high management change and the like, appears in a short time of an enterprise, which often causes a large negative impact on normal business activities of the enterprise. By collecting and analyzing the public information associated with the enterprise, the domain text sentiment is analyzed, processed and summarized, the public sentiment risk information can be provided for the enterprise, the public sentiment trend of the enterprise can be mastered in time, business personnel can be assisted to discover the potential risk of the enterprise in time and carry out risk management, and therefore credit risk is avoided or reduced.
Most of existing text emotion recognition methods adopt conventional machine learning models, but in the field of special industries, a large number of training samples are marked manually in advance, time and labor are wasted, marking cost is very high, the number of high-quality samples is limited, the performance of the models is affected due to insufficient number of samples, and accuracy of text emotion recognition in the specific field is further affected.
Disclosure of Invention
In order to solve the problems mentioned in the background art, the invention provides a method and a device for recognizing emotion of a domain text, a computer device and a storage medium.
In a first aspect, a method for recognizing emotion of domain text is provided, and the method includes:
performing sentence division and word division on the field text to be recognized;
matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining a segmentation sentence in which each emotion word is located;
calculating and normalizing emotion intensity values of the field text based on preset emotion intensity of each emotion word, a clause where each emotion word is located and preset position weight of each clause at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
Further, the emotion dictionary is constructed by the following method:
constructing a seed dictionary in a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight;
carrying out clause segmentation on the text in the preset corpus of the specific field, carrying out shallow syntactic analysis on each obtained clause, analyzing the syntactic components of each clause and forming a syntactic tree;
based on a functional grammar theory, classifying words with the same function on the syntax tree into the same word category;
taking each word in the word category with the preset category label as a candidate word respectively, and acquiring a co-occurrence word of each candidate word by combining the context to form a candidate word set;
and screening the expansion words of the seed emotion words from the candidate word set, and updating the seed dictionary according to the expansion words of the seed emotion words to construct the emotion dictionary.
Further, the calculating the emotion intensity value of the field text based on the preset emotion intensity of each emotion word, the clause where each emotion word is located, and the preset position weight of each clause at different positions of the field text includes:
calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs;
and calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause.
Further, the calculating, based on the emotion polarity to which the preset emotion intensity of each emotion word belongs, an emotion intensity value of a sentence in which each emotion word is located under each emotion polarity includes:
adjusting the preset emotion intensity of each emotion word based on the font form of each emotion word;
and calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity adjusted by each emotion word belongs.
Further, the calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause includes:
adjusting the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on punctuation marks, degree adverbs and/or negative words in the clause where each emotion word is located;
and calculating the emotion intensity value of the field text under each emotion polarity based on the preset position weight of each clause and the emotion intensity value of the clause where each emotion word is located after the clause is adjusted under each emotion polarity.
Further, the method further comprises:
summing the normalized emotion intensity values of the domain texts under the emotion polarities to obtain a comprehensive value of the emotion intensity values of the domain texts;
and when the comprehensive value of the emotion intensity value of the field text meets a preset early warning trigger condition, generating early warning information and pushing the early warning information to a preset terminal.
In a second aspect, a domain text emotion recognition apparatus is provided, the apparatus including:
the preprocessing module is used for segmenting sentences and words of the field text to be recognized;
the matching module is used for matching the segmented words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the segmented sentence where each emotion word is located;
the calculation module is used for calculating and normalizing the emotion intensity value of the field text based on the preset emotion intensity of each emotion word and the preset position weight of each clause at different positions of the field text;
and the output module is used for outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
Further, the apparatus further comprises a construction module, which is specifically configured to:
constructing a seed dictionary in a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight;
carrying out clause segmentation on the text in the preset corpus of the specific field, carrying out shallow syntactic analysis on each obtained clause, analyzing the syntactic components of each clause and forming a syntactic tree;
based on a functional grammar theory, classifying words with the same function on the syntax tree into the same word category;
taking each word in the word category with the preset category label as a candidate word respectively, and acquiring a co-occurrence word of each candidate word by combining the context to form a candidate word set;
and screening the expansion words of the seed emotion words from the candidate word set, and updating the seed dictionary according to the expansion words of the seed emotion words to construct the emotion dictionary.
Further, the calculation module includes:
the first calculation submodule is used for calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs;
and the second calculation submodule is used for calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause.
Further, the first computation submodule is specifically configured to:
adjusting the preset emotion intensity of each emotion word based on the font form of each emotion word;
and calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity adjusted by each emotion word belongs.
Further, the second computation submodule is specifically configured to:
adjusting the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on punctuation marks, degree adverbs and/or negative words in the clause where each emotion word is located;
and calculating the emotion intensity value of the field text under each emotion polarity based on the preset position weight of each clause and the emotion intensity value of the clause where each emotion word is located after the clause is adjusted under each emotion polarity.
Further, the output module is further configured to:
summing the normalized emotion intensity values of the domain texts under the emotion polarities to obtain a comprehensive value of the emotion intensity values of the domain texts;
and when the comprehensive value of the emotion intensity value of the field text meets a preset early warning trigger condition, generating early warning information and pushing the early warning information to a preset terminal.
In a third aspect, a computer device is provided, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
performing sentence division and word division on the field text to be recognized;
matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining a segmentation sentence in which each emotion word is located;
calculating and normalizing emotion intensity values of the field text based on preset emotion intensity of each emotion word, a clause where each emotion word is located and preset position weight of each clause at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
In a fourth aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of:
performing sentence division and word division on the field text to be recognized;
matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining a segmentation sentence in which each emotion word is located;
calculating and normalizing emotion intensity values of the field text based on preset emotion intensity of each emotion word and preset position weights of each clause at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
The invention provides a method and a device for recognizing emotion of a field text, a computer device and a storage medium, wherein sentences and words are divided by the field text to be recognized; matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the segmentation sentence where each emotion word is located; calculating emotion intensity values of the field text and normalizing the emotion intensity values based on preset emotion intensity of each emotion word, the clause where each emotion word is located and preset position weights of the clauses at different positions of the field text; based on the normalized emotion intensity value of the field text, the text emotion recognition result is output, the method is different from a machine learning method needing a large amount of training data sets, emotion words in the field text and preset emotion intensity of the emotion words are matched by constructing an emotion dictionary in a specific field, clauses where the emotion words are located are determined, emotion recognition is carried out on the field text by combining preset position weights of the clauses in the field text, and the efficiency and accuracy of emotion recognition can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for recognizing emotion of a domain text according to an embodiment of the present invention;
FIG. 2 shows a flow chart for constructing an emotion dictionary provided by an embodiment of the present invention;
FIG. 3 shows a detailed flowchart of step 103 shown in FIG. 1;
FIG. 4 illustrates a syntax tree structure provided by an embodiment of the present invention;
fig. 5 is a block diagram illustrating a domain text emotion recognition apparatus according to an embodiment of the present invention;
fig. 6 shows a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be understood that, unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
Furthermore, in the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
As described in the foregoing background art, most of the existing text emotion recognition methods use a conventional machine learning model, but in the field of special industries, manually labeling a large number of training samples in advance is time-consuming and labor-consuming, and the labeling cost is very high, which limits the number of high-quality samples, and the lack of the number of samples affects the model performance, thereby affecting the accuracy of text emotion recognition in a specific field. Therefore, the embodiment of the invention provides an enterprise abbreviation extraction method, which is different from a machine learning method requiring a large amount of training data sets, and is characterized in that an emotion dictionary in a specific field is constructed to match emotion words in a field text and preset emotion intensity of the emotion words, clauses where the emotion words are located are determined, and emotion recognition is performed on the field text by combining preset position weights of the clauses in the field text, so that the emotion recognition efficiency and accuracy can be effectively improved. The method for recognizing the emotion of the domain text provided by the embodiment of the invention is exemplified by being applied to the financial field, and it can be understood that the method can also be applied to other specific fields, such as the medical field, the commodity field and the like.
Fig. 1 is a flowchart illustrating a method for constructing a domain language model according to an embodiment of the present invention, where as shown in fig. 1, the method may include:
and 101, performing sentence segmentation and word segmentation on the field text to be recognized.
The field text to be recognized is a text related to the target enterprise, and the field text may be entered by a user, such as by voice, handwriting or other means, or may be crawled by a crawler tool, such as a financial news report.
Specifically, the text of the field to be recognized may be subjected to pre-processing such as de-weighting and de-noising to obtain a plain text, the plain text may be subjected to sentence segmentation and word segmentation to obtain each sentence and each sentence in the sentences, where the sentence segmentation may be performed by using punctuations such as a sentence number, a semicolon, a question mark, an exclamation mark, etc., which represent the end of a sentence, as sentence separators, and the word segmentation may be performed by segmenting each sentence into a plurality of words based on a chinese word segmentation tool.
And 102, matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the segmentation sentence where each emotion word is located.
The preset emotion dictionary comprises a plurality of emotion words and preset emotion intensity of each emotion word, each emotion word can be preset with corresponding emotion intensity according to emotion polarity combined with an expert experience method, the emotion polarity comprises positive emotion, neutral emotion and negative emotion, the emotion intensity represents emotion intensity of the emotion words, when the emotion words are negative emotion, the emotion intensity values of the emotion words can be set to "-4", "-3", "-2" -and "-1", when the emotion words are neutral emotion, the emotion intensity values of the emotion words can be set to "0", and when the emotion words are positive emotion, the emotion intensity values of the emotion words can be set to "1", "2", "3" and "4".
Specifically, the participles in each clause may be matched in an emotion dictionary, whether each clause includes an emotion word successfully matched with the emotion dictionary is judged, and if yes, the preset emotion intensity of the emotion word in the emotion dictionary is obtained.
And 103, calculating the emotion intensity value of the field text and normalizing the emotion intensity value based on the preset emotion intensity of each emotion word, the clause where each emotion word is located and the preset position weight of each clause at different positions of the field text.
The domain text is generally a large segment of characters, and comprises different components including a title, an abstract, each paragraph, a paragraph title and the like, and the different components can be preset with different position weights according to an expert experience method so as to better identify the emotion of the text.
Specifically, the process may include:
the clauses where the emotional words are located can be determined in the field text, and the emotional polarity to which the emotional intensity of the emotional words in the clauses belongs is determined;
respectively calculating the sum of the emotional intensity of the emotional words under each emotional polarity aiming at each clause to obtain the emotional intensity value of each clause under each emotional polarity;
calculating the emotion intensity value of the field text under each emotion polarity according to the preset position weight of each clause and the emotion intensity value of each clause under each emotion polarity;
and respectively normalizing the emotional intensity values of the field text under each emotional polarity to obtain the emotional intensity values under each emotional polarity from '-1' to '1'. Wherein the normalization can use a formula
Figure BDA0002590520770000091
x is the emotional intensity value, and a is a preset normalization parameter.
And 104, outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
Specifically, the normalized emotion intensity values of the domain text under the emotion polarities and the integrated value of the emotion intensity values of the domain text are output.
In this embodiment, by outputting the emotion lightness values under different emotion polarities, the user can intuitively know the comparison between the positive and negative emotions included in one field text, and can more comprehensively know the emotion distribution of the field text.
The invention provides a method for recognizing emotion of a field text, which is characterized in that a field text to be recognized is divided into sentences and words; matching the participles in each clause of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the clause in which each emotion word is located; calculating emotion intensity values of the field text and normalizing the emotion intensity values based on preset emotion intensity of each emotion word, the clause where each emotion word is located and preset position weights of the clauses at different positions of the field text; based on the normalized emotion intensity value of the field text, the text emotion recognition result is output, the method is different from a machine learning method needing a large amount of training data sets, the emotion words in the field text, the clauses where the emotion words are located and the preset emotion intensity of the emotion words are determined by constructing an emotion dictionary in a specific field, the emotion recognition is carried out on the field text by combining the preset position weight of the clauses in the field text, and the efficiency and the accuracy of emotion recognition can be effectively improved.
In one embodiment, referring to fig. 2, the emotion dictionary is constructed by the following steps:
and 201, constructing a seed dictionary of a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight.
In practical use, the text emotion recognition mainly uses 4 categories of dictionaries: emotional words, degree adverbs, negative words, domain specialty words. The construction of the emotion word dictionary is basic work which is time-consuming and large in investment, and particularly in the financial field, no open professional emotion word dictionary is found at present.
Specifically, a seed emotion word related to an enterprise in the financial field can be extracted manually by combining an expert experience method, corresponding emotion intensity is labeled on the seed emotion word, the value range of the emotion intensity is divided into a plurality of different threshold value intervals in advance, the different threshold value intervals correspond to different emotion polarities, the value range of the emotion intensity can be set according to actual needs, for example, the emotion polarity corresponding to-4 to +4, -3, -2, -1 is negative emotion, the emotion polarity corresponding to the emotion word with emotion intensity of 1, 2, 3, 4 is positive emotion, and the emotion polarity at the midpoint 0 is neutral emotion. Illustratively, the emotion intensity of "closing upside down" and "revenue growth" may be labeled as-4, +3, respectively, and the corresponding emotion polarity is negative emotion, positive emotion, respectively.
And 202, performing clauses on texts in a preset corpus of a specific field, performing shallow syntactic analysis on each obtained clause, and analyzing syntactic components of each clause to form a syntactic tree.
The method comprises the steps of crawling financial news reports related to enterprises by using a crawler tool, obtaining plain texts through preprocessing such as de-duplication and de-noising, storing the texts by using uniform codes and predefined formats to build an enterprise public opinion corpus in the financial field, dividing the texts in the corpus, performing shallow syntax analysis on each sentence, analyzing syntax components of each sentence, and forming a syntax tree.
Illustratively, taking the sentence "deep traffic asking listed company to explain whether there is an insider transaction" as an example, performing shallow syntactic analysis after word segmentation can obtain the syntactic tree structure shown in fig. 3.
And 203, classifying the words with the same function on the syntax tree into the same word category based on the functional grammar theory.
In this case, words with the same syntactic function often occupy the same node on the syntactic tree, and therefore words on the same node can be classified into the same word category.
In the embodiment, semi-automatic dictionary construction can be realized based on the functional grammar theory, efficiency can be improved, and the cost for constructing the dictionary is saved.
204, taking each word in the word category with the preset category label as a candidate word, and obtaining the co-occurrence word of each candidate word by combining the context to form a candidate word set.
Specifically, the method may perform a co-occurrence analysis on each candidate word in combination with a context environment, further perform filtering, and screen out an anti-word or a synonym of each candidate word to form a candidate word set, where the candidate word set includes the candidate word and the anti-word or the synonym of the candidate word.
The similarity between the candidate words and the co-occurrence words can be calculated through the following formula so as to determine the synonyms with the similarity higher than a preset value:
Figure BDA0002590520770000111
where Sim (a, B) represents the similarity between word a and word B, common (a, B) represents the commonality information between word a and word B, and description (a, B) represents all the information between word a and word B.
And 205, screening the expansion words of each seed emotion word from the candidate word set, and updating the seed dictionary according to the expansion words of each seed emotion word to construct an emotion dictionary.
The words in the candidate word set can be screened in an auxiliary manual mode, wrong or inappropriate words in the candidate word set are eliminated according to the seed emotional words, and synonyms or antisense words of the seed emotional words are reserved as expansion words of the seed emotional words.
The method comprises the steps of marking corresponding emotion intensity on an expansion word of a seed emotion word according to the emotion intensity of the seed emotion word, marking the expansion word with the same emotion intensity if the expansion word is a synonym, marking opposite emotion intensity on the expansion word if the expansion word is an antisense word, adding the expansion word of each seed emotion word and the corresponding emotion intensity into a seed dictionary, and constructing to obtain an emotion dictionary.
It is understood that the emotion words in the emotion dictionary can be continuously expanded by repeating the above steps 202 to 205 to realize the continuous expansion of the emotion dictionary.
In the embodiment, the emotional words extracted in the mode based on the seed dictionary and the context rule have high quality, and can be quickly applied to an online production system; meanwhile, the emotion dictionary can be continuously expanded along with the increase of the accumulation of the corpus in the mode, the construction efficiency of the emotion dictionary is improved, and the manual workload is saved.
In an embodiment, referring to fig. 4, in the step 103, the calculating the emotion intensity value of the field text based on the preset emotion intensity of each emotion word, the clause where each emotion word is located, and the preset position weight of each clause at different positions of the field text may include the steps of:
401, calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs.
In one example, the implementation of step 401 may include:
positioning clauses where the emotional words are respectively located; and aiming at each positioned clause, dividing each emotion word in the clause into corresponding emotion polarities according to the preset emotion intensity of the emotion word in the clause, and summing the preset emotion intensities of the emotion words in the clause according to different emotion polarities to obtain the emotion intensity values of the clause under different emotion polarities.
In another example, the implementation of step 401 may include:
adjusting the preset emotion intensity of each emotion word based on the font form of each emotion word;
and calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity adjusted by each emotion word belongs.
The web page text often uses fonts or thicknesses with different sizes to distinguish the importance of information, characters of capital or bold fonts generally express stronger emotion, correspondingly, contained emotional words also obtain higher emotional intensity, whether the font form of each emotional word is a preset font form is identified, if so, an emotional incentive value assigned to the preset font form in advance according to an expert experience method is obtained, and the preset emotional intensity of the emotional word is subjected to positive excitation or negative excitation adjustment according to the emotional polarity of the emotional word, for example, the preset emotional incentive value of the bold font form is 0.28, if the preset emotional intensity of a certain emotional word to be bold is-3, the emotional word is a negative emotion, the preset emotional intensity of the emotional word is subjected to negative excitation adjustment, so that the preset emotional intensity after the emotional word is adjusted is-3.28, if the preset emotion intensity of a certain roughly written emotion word is 3 and the emotion word is positive emotion, the preset emotion intensity of the emotion word is adjusted in a positive excitation mode, and the adjusted preset emotion intensity of the emotion word is 3.28.
After the preset emotion intensity of each emotion word is adjusted, positioning the clauses where the emotion words are respectively located; and aiming at each positioned clause, classifying each emotional word in the clause into a corresponding emotional polarity according to the preset emotional intensity of the emotional word in the clause, and summing the preset emotional intensity emotional polarities of the emotional words in the clause respectively to obtain the emotional intensity values of the clause under different emotional polarities.
In the embodiment, when the emotion intensity value of the clause where each emotion word is located is calculated, the preset emotion intensity of each emotion word is adjusted according to the font form of each emotion word, and the font form of the emotion word is taken into consideration, so that the emotion intensity value of the clause where the emotion word is located can be calculated more accurately and objectively.
And 402, calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause.
In one example, the implementation of step 402 may include:
and according to the preset position weight of each clause, carrying out weighted summation on the emotion intensity values of the clauses where the emotion words are located according to different emotion polarities, and calculating to obtain the emotion intensity values of the field text under different emotion polarities.
In another example, the implementation of step 402 may include:
adjusting the emotion intensity value of the clause in which each emotion word is positioned under each emotion polarity based on punctuation marks, degree adverbs and/or negative words in the clause in which each emotion word is positioned;
and calculating the emotion intensity value of the field text under each emotion polarity based on the preset position weight of each clause and the adjusted emotion intensity value of the clause where each emotion word is positioned under each emotion polarity.
Where web page text tends to use different punctuation marks to distinguish the importance of information, sentences that generally express stronger emotions using exclamation and question marks will, in turn, receive higher emotional intensity, e.g., "I like this company" and "I like this company! | A | A "different emotional intensity is expressed by the punctuation mark, the latter expression representing a stronger positive emotion. In specific application, whether the clause in which the emotion word is located includes a preset punctuation mark (such as an exclamation mark, a question mark and the like) or not can be identified, if so, an emotion incentive value assigned to the preset punctuation mark in advance according to an expert experience method is obtained, and the emotion intensity values of the clause including the preset punctuation mark under each emotion polarity are adjusted. For example, the emotional incentive values corresponding to the exclamation mark and question mark are 0.29 and 0.18, respectively, and if the clause includes the exclamation mark, the emotional intensity value of the clause under negative emotion is adjusted to be negatively motivated (-0.292), and the emotional intensity value of the clause under positive emotion is adjusted to be positively motivated (+ 0.292).
In addition, some degree adverbs, such as adverbs like "very", etc., are often accompanied in the context of emotional words to represent emotional intensity of different degrees, and corresponding emotional incentive values can be assigned to the different degree adverbs in advance according to expert experience methods. In specific application, whether the clause where the emotion word is located contains the degree adverb or not can be recognized through a preset adverb dictionary, if yes, an emotion incentive value assigned to the degree adverb in advance is obtained, and the emotion intensity value of the clause containing the adverb under each emotion polarity is adjusted. And adjusting the emotion intensity value of the clause containing the degree adverb under the negative emotion, and adjusting the emotion intensity value of the clause containing the degree adverb under the positive emotion.
In addition, negative words may exist in the context of emotional words, the negative words generally indicate opposite emotional polarities of the emotional words, and the polarity of the clauses may be shifted according to whether the clause in which the emotional words are located contains the negative word, for example, "but" two clauses with different comparative emotions are associated, however, the main emotion is often the latter emotion. For example, "business's revenue is increasing this year, but profits are decreasing. "the former clause is positive and the latter clause is negative, apparently" but "emotionally dominant. In specific application, whether a clause where an emotional word is located contains a negative word or not can be identified through a preset negative word list, if the negative word is identified, the emotion intensity values of the clauses containing the negative word under different emotion polarities are adjusted in a reverse direction, specifically, the emotion intensity value under negative emotion can be adjusted to the emotion intensity value under positive emotion, and the emotion intensity value under positive emotion can be adjusted to the emotion intensity value under negative emotion.
In the embodiment, when the emotion intensity values of the field text under different emotion polarities are calculated, the emotion intensity of each emotion word and the font form of the emotion word are considered, and multiple characteristics of punctuation marks, degree adverbs, negative words and the like in the clause where the emotion word is located are also considered, so that the emotion intensity values of the field text under different emotion polarities can be calculated more accurately, objectively and comprehensively.
In one embodiment, the method further comprises:
summing the normalized emotion intensity values of the domain texts under the emotion polarities to obtain a comprehensive value of the emotion intensity values of the domain texts;
and when the comprehensive value of the emotion intensity value of the field text meets a preset early warning trigger condition, generating early warning information and pushing the early warning information to a preset terminal.
The preset early warning triggering conditions comprise a preset emotion threshold interval, and when the comprehensive value of the emotion intensity value of the field text falls within the preset emotion threshold interval, early warning information for a target enterprise related to the field text is generated and pushed to a preset terminal. In practical application, the emotion threshold interval can be set to be [ -1, -02], and when the early warning information is pushed, a task can be automatically pushed or manually triggered to be pushed to a preset terminal.
In this embodiment, the early warning information can be generated and pushed to the preset terminal, so that relevant business personnel can obtain the risk information of the target enterprise as early as possible, and a data basis is provided for risk analysis of the target enterprise.
Fig. 5 is a block diagram illustrating a domain text emotion recognition apparatus according to an embodiment of the present invention, where the apparatus may be configured in any computer device, so that the computer device may execute the domain text emotion recognition method according to the above embodiment. The computer devices may be configured as various terminals, such as servers, which may be implemented as a single service or a cluster of servers.
Referring to fig. 5, a domain text emotion recognition apparatus according to an embodiment of the present invention may include:
the preprocessing module 51 is used for performing sentence segmentation and word segmentation on the field text to be recognized;
the matching module 52 is configured to match the field text with the segmented words in a preset emotion dictionary, determine at least one emotion word and a preset emotion intensity of each emotion word, and determine the segment in which each emotion word is located;
the calculation module 53 is configured to calculate emotion intensity values of the field text and normalize the emotion intensity values based on preset emotion intensity of each emotion word, a clause where each emotion word is located, and preset position weights of each clause at different positions of the field text;
and the output module 54 is configured to output a text emotion recognition result based on the normalized emotion intensity value of the domain text.
In one embodiment, the apparatus further comprises a building module 50, the building module 50 being specifically configured to:
constructing a seed dictionary in a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight;
the method comprises the steps of performing clauses on texts in a preset corpus in a specific field, performing shallow syntactic analysis on each obtained clause, analyzing syntactic components of each clause, and forming a syntactic tree;
based on a functional grammar theory, classifying words with the same function on a syntax tree into the same word category;
taking each word in the word categories with the preset category labels as candidate words respectively, and acquiring the co-occurrence words of each candidate word by combining the context to form a candidate word set;
and screening the expansion words of each seed emotion word from the candidate word set, and updating the seed dictionary according to the expansion words of each seed emotion word to construct an emotion dictionary.
In one embodiment, the calculation module 53 includes:
the first calculation submodule is used for calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs;
and the second calculation submodule is used for calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause.
In one embodiment, the first computation submodule is specifically configured to:
adjusting the preset emotion intensity of each emotion word based on the font form of each emotion word;
and calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity adjusted by each emotion word belongs.
In one embodiment, the second computation submodule is specifically configured to:
adjusting the emotion intensity value of the clause in which each emotion word is positioned under each emotion polarity based on punctuation marks, degree adverbs and/or negative words in the clause in which each emotion word is positioned;
and calculating the emotion intensity value of the field text under each emotion polarity based on the preset position weight of each clause and the adjusted emotion intensity value of the clause where each emotion word is positioned under each emotion polarity.
In one embodiment, the output module 54 is further configured to:
summing the normalized emotion intensity values of the domain texts under the emotion polarities to obtain a comprehensive value of the emotion intensity values of the domain texts;
and when the comprehensive value of the emotion intensity value of the field text meets a preset early warning trigger condition, generating early warning information and pushing the early warning information to a preset terminal.
It should be noted that: in the domain text emotion recognition device provided in the embodiment of the present invention, only the division of each function module is exemplified, and in practical applications, the function distribution may be completed by different function modules as needed, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above. In addition, specific implementation processes and beneficial effects of the domain text emotion recognition device in this embodiment are detailed in the domain text emotion recognition method in the embodiment, and are not described herein again.
Fig. 6 is an internal structural diagram of a computer device according to an embodiment of the present invention. The computer device may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a domain text emotion recognition method.
Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is also provided a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
performing sentence division and word division on the field text to be recognized;
matching the participles of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the clause where each emotion word is located;
calculating emotion intensity values of the field text and normalizing the emotion intensity values based on preset emotion intensity of each emotion word, the clause where each emotion word is located and preset position weights of the clauses at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
In one embodiment, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:
performing sentence division and word division on the field text to be recognized;
matching the participles of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the clause where each emotion word is located;
calculating emotion intensity values of the field text and normalizing the emotion intensity values based on preset emotion intensity of each emotion word, the clause where each emotion word is located and preset position weights of the clauses at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for recognizing emotion of a domain text, the method comprising:
performing sentence division and word division on the field text to be recognized;
matching the segmentation words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining a segmentation sentence in which each emotion word is located;
calculating and normalizing emotion intensity values of the field text based on preset emotion intensity of each emotion word, a clause where each emotion word is located and preset position weight of each clause at different positions of the field text;
and outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
2. The method of claim 1, wherein the emotion dictionary is constructed by:
constructing a seed dictionary in a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight;
carrying out clause segmentation on the text in the preset corpus of the specific field, carrying out shallow syntactic analysis on each obtained clause, analyzing the syntactic components of each clause and forming a syntactic tree;
based on a functional grammar theory, classifying words with the same function on the syntax tree into the same word category;
taking each word in the word category with the preset category label as a candidate word respectively, and acquiring a co-occurrence word of each candidate word by combining the context to form a candidate word set;
and screening the expansion words of the seed emotion words from the candidate word set, and updating the seed dictionary according to the expansion words of the seed emotion words to construct the emotion dictionary.
3. The method of claim 1, wherein calculating the emotion intensity value of the field text based on the preset emotion intensity of each emotion word, the clause where each emotion word is located, and the preset position weight of each clause at different positions of the field text comprises:
calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs;
and calculating the emotion intensity value of the field text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause.
4. The method according to claim 3, wherein the calculating of the emotion intensity value of the sentence in which each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity of each emotion word belongs includes:
adjusting the preset emotion intensity of each emotion word based on the font form of each emotion word;
and calculating the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on the emotion polarity to which the preset emotion intensity adjusted by each emotion word belongs.
5. The method according to claim 3 or 4, wherein the calculating of the emotion intensity value of the domain text under each emotion polarity based on the emotion intensity value of the clause where each emotion word is located under each emotion polarity and the preset position weight of each clause comprises:
adjusting the emotion intensity value of the clause where each emotion word is located under each emotion polarity based on punctuation marks, degree adverbs and/or negative words in the clause where each emotion word is located;
and calculating the emotion intensity value of the field text under each emotion polarity based on the preset position weight of each clause and the emotion intensity value of the clause where each emotion word is located after the clause is adjusted under each emotion polarity.
6. The method of claim 5, further comprising:
summing the normalized emotion intensity values of the domain texts under the emotion polarities to obtain a comprehensive value of the emotion intensity values of the domain texts;
and when the comprehensive value of the emotion intensity value of the field text meets a preset early warning trigger condition, generating early warning information and pushing the early warning information to a preset terminal.
7. A domain text emotion recognition apparatus, characterized in that the apparatus comprises:
the preprocessing module is used for segmenting sentences and words of the field text to be recognized;
the matching module is used for matching the segmented words of the field text in a preset emotion dictionary, determining at least one emotion word and preset emotion intensity of each emotion word, and determining the segmented sentence where each emotion word is located;
the calculation module is used for calculating and normalizing the emotion intensity value of the field text based on the preset emotion intensity of each emotion word and the preset position weight of each clause at different positions of the field text;
and the output module is used for outputting a text emotion recognition result based on the normalized emotion intensity value of the field text.
8. The apparatus according to claim 7, further comprising a construction module, the construction module being specifically configured to:
constructing a seed dictionary in a specific field, wherein each seed emotion word contained in the seed dictionary is marked with corresponding emotion intensity and emotion weight;
carrying out clause segmentation on the text in the preset corpus of the specific field, carrying out shallow syntactic analysis on each obtained clause, analyzing the syntactic components of each clause and forming a syntactic tree;
based on a functional grammar theory, classifying words with the same function on the syntax tree into the same word category;
taking each word in the word category with the preset category label as a candidate word respectively, and acquiring a co-occurrence word of each candidate word by combining the context to form a candidate word set;
and screening the expansion words of the seed emotion words from the candidate word set, and updating the seed dictionary according to the expansion words of the seed emotion words to construct the emotion dictionary.
9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of domain text emotion recognition as recited in any of claims 1 to 6 when the computer program is executed.
10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out a method for emotion recognition of a domain text as claimed in any one of claims 1 to 6.
CN202010694597.XA 2020-07-17 2020-07-17 Domain text emotion recognition method and device, computer equipment and storage medium Withdrawn CN112036175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010694597.XA CN112036175A (en) 2020-07-17 2020-07-17 Domain text emotion recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010694597.XA CN112036175A (en) 2020-07-17 2020-07-17 Domain text emotion recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112036175A true CN112036175A (en) 2020-12-04

Family

ID=73579210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010694597.XA Withdrawn CN112036175A (en) 2020-07-17 2020-07-17 Domain text emotion recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112036175A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361267A (en) * 2021-06-29 2021-09-07 招商局金融科技有限公司 Sample data generation method, device, equipment and storage medium
CN114385894A (en) * 2021-12-30 2022-04-22 粤开证券股份有限公司 Public opinion monitoring method and device based on dictionary

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807920A (en) * 2017-11-17 2018-03-16 新华网股份有限公司 Construction method, device and the server of mood dictionary based on big data
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
CN110609996A (en) * 2018-06-15 2019-12-24 阿里巴巴集团控股有限公司 Text emotion recognition method and device and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807920A (en) * 2017-11-17 2018-03-16 新华网股份有限公司 Construction method, device and the server of mood dictionary based on big data
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
CN110609996A (en) * 2018-06-15 2019-12-24 阿里巴巴集团控股有限公司 Text emotion recognition method and device and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361267A (en) * 2021-06-29 2021-09-07 招商局金融科技有限公司 Sample data generation method, device, equipment and storage medium
CN113361267B (en) * 2021-06-29 2024-02-09 招商局金融科技有限公司 Sample data generation method, device, equipment and storage medium
CN114385894A (en) * 2021-12-30 2022-04-22 粤开证券股份有限公司 Public opinion monitoring method and device based on dictionary
CN114385894B (en) * 2021-12-30 2024-05-31 粤开证券股份有限公司 Dictionary-based public opinion monitoring method and device

Similar Documents

Publication Publication Date Title
CN110765763B (en) Error correction method and device for voice recognition text, computer equipment and storage medium
US11151130B2 (en) Systems and methods for assessing quality of input text using recurrent neural networks
US11093854B2 (en) Emoji recommendation method and device thereof
CN109685056B (en) Method and device for acquiring document information
CN112163424B (en) Data labeling method, device, equipment and medium
Mohanty et al. Resumate: A prototype to enhance recruitment process with NLP based resume parsing
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN107229612B (en) Network information semantic tendency analysis method and system
CN113837531A (en) Product quality problem finding and risk assessment method based on network comments
CN109165295B (en) Intelligent resume evaluation method
CN112380346B (en) Financial news emotion analysis method and device, computer equipment and storage medium
CN112364628B (en) New word recognition method and device, electronic equipment and storage medium
CN113704436A (en) User portrait label mining method and device based on session scene
CN112036185B (en) Method and device for constructing named entity recognition model based on industrial enterprise
CN112036175A (en) Domain text emotion recognition method and device, computer equipment and storage medium
CN114219337A (en) Service quality evaluation method, system, equipment and readable storage medium
CN115687621A (en) Short text label labeling method and device
CN107783958B (en) Target statement identification method and device
CN117540004B (en) Industrial domain intelligent question-answering method and system based on knowledge graph and user behavior
CN114265931A (en) Big data text mining-based consumer policy perception analysis method and system
CN113515587B (en) Target information extraction method, device, computer equipment and storage medium
CN110888977B (en) Text classification method, apparatus, computer device and storage medium
CN110941713A (en) Self-optimization financial information plate classification method based on topic model
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN113609297A (en) Public opinion monitoring method and device for court industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201204

WW01 Invention patent application withdrawn after publication