CN115878752A - Text emotion analysis method, device, equipment, medium and program product - Google Patents

Text emotion analysis method, device, equipment, medium and program product Download PDF

Info

Publication number
CN115878752A
CN115878752A CN202111658765.0A CN202111658765A CN115878752A CN 115878752 A CN115878752 A CN 115878752A CN 202111658765 A CN202111658765 A CN 202111658765A CN 115878752 A CN115878752 A CN 115878752A
Authority
CN
China
Prior art keywords
emotion
vocabulary
word
words
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111658765.0A
Other languages
Chinese (zh)
Inventor
李树海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of CN115878752A publication Critical patent/CN115878752A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a text emotion analysis method, a text emotion analysis device, text emotion analysis equipment, a text emotion analysis medium and a text emotion analysis program product, and relates to the technical field of computers. The method comprises the following steps: acquiring a target text; performing word segmentation processing on the target text to obtain word segmentation words in the target text; acquiring a modified vocabulary in the target text, wherein the distance between the modified vocabulary and the participle vocabulary is within a character number threshold range; determining the emotion degree corresponding to the word segmentation words based on the matching relation between the word segmentation words and the emotion word library and the modified words; and determining an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation words in the target text. By the method, the segmented words and the modified words are combined to analyze to determine the analysis result of the target text emotion, so that the considered factors are more comprehensive when the target text is subjected to emotion analysis, and the analysis result of the target text emotion is more accurate. The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence and intelligent traffic.

Description

Text emotion analysis method, device, equipment, medium and program product
The application requires: priority of chinese patent application No. 202111151765.1 entitled "method, apparatus, device, medium, and program product for analyzing textual emotion" filed on 29/09 of 2021, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for analyzing text sentiment.
Background
Along with the development of the mobile internet, various different text data are gradually diversified, in the process of analyzing different text data, the emotion analysis of the text data plays an increasingly important role in understanding the text content, the emotion analysis is performed on the text content to further obtain valuable knowledge and information, and the method has important practical value for information prediction and electronic commerce.
In the related technology, generally, emotion analysis is directly performed on word segmentation words in a target text to be analyzed to obtain emotion degrees corresponding to the word segmentation words, and then emotion tendencies of the target text are determined according to the emotion degrees corresponding to different word segmentation words.
However, only the participle words in the target text are analyzed, so that the influence of other words except the participle words in the target text on the participle words is easily ignored, or the content information in the sentences in the target text is ignored, and further, the emotion analysis accuracy of the target text is low.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment, a medium and a program product for analyzing text emotion, which can effectively improve the accuracy of text emotion analysis. The technical scheme is as follows.
In one aspect, a method for analyzing text emotion is provided, and the method includes:
acquiring a target text, wherein the target text is a text to be subjected to emotion analysis;
performing word segmentation processing on the target text to obtain word segmentation words in the target text;
acquiring a modified vocabulary of which the distance from the target text to the participle vocabulary is within a character number threshold range, wherein the modified vocabulary is used for modifying the participle vocabulary;
determining the emotion degree corresponding to the word segmentation words based on the matching relation between the word segmentation words and an emotion word library and the modified words, wherein the emotion word library comprises emotion words marked with basic emotion degrees;
and determining an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation words in the target text.
In another aspect, an apparatus for analyzing text emotion is provided, the apparatus including:
the text acquisition module is used for acquiring a target text, wherein the target text is a text to be subjected to emotion analysis;
the word segmentation processing module is used for carrying out word segmentation processing on the target text to obtain word segmentation words in the target text;
the vocabulary acquisition module is used for acquiring a modified vocabulary of which the distance from the segmented vocabulary in the target text is within a character number threshold range, wherein the modified vocabulary is used for modifying the segmented vocabulary;
the degree determining module is used for determining the emotion degree corresponding to the word segmentation words based on the matching relation between the word segmentation words and an emotion word library and the modified words, wherein the emotion word library comprises emotion words marked with basic emotion degrees;
and the result determining module is used for determining the emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation vocabulary in the target text.
In another aspect, a computer device is provided and includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for analyzing text emotion as described in any of the embodiments of the present application.
In another aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and loaded and executed by a processor to implement the method for analyzing text emotion as described in any of the embodiments of the present application.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the text emotion analysis method in any one of the above embodiments.
The beneficial effects that technical scheme that this application embodiment brought include at least:
firstly, performing word segmentation processing on an acquired target text to obtain word segmentation words, then searching for modified words in a range of a certain character away from the word segmentation words, wherein the modified words are used for modifying the word segmentation words, so that the basic emotion degree of the word segmentation words can be influenced, then analyzing the word segmentation words by combining the modified words to obtain more accurate emotion degrees corresponding to the word segmentation words, and finally determining the emotion analysis result of the target text based on the emotion degrees corresponding to a plurality of word segmentation words in the target text. Through the method, the word segmentation vocabularies and the modified vocabularies within a certain character number threshold range are combined and analyzed, and further the analysis result of the target text emotion is determined, so that when the target text is subjected to emotion analysis, the range of the modified vocabularies is more determined, the considered factors are more comprehensive, and the analysis result of the target text is more accurate.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic illustration of an implementation environment provided by an exemplary embodiment of the present application;
FIG. 2 is a flowchart of a method for analyzing text sentiment provided by an exemplary embodiment of the present application;
FIG. 3 is a flowchart of a method for analyzing textual emotion as provided by another exemplary embodiment of the present application;
FIG. 4 is a flowchart of a method for analyzing text sentiment provided by another exemplary embodiment of the present application;
FIG. 5 is a process diagram of a method for analyzing text emotion according to an exemplary embodiment of the present application;
FIG. 6 is a block diagram of an apparatus for analyzing text emotion according to an exemplary embodiment of the present application;
FIG. 7 is a block diagram of an apparatus for analyzing text emotion according to another exemplary embodiment of the present application;
fig. 8 is a block diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
First, terms referred to in the embodiments of the present application will be briefly described.
Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP): is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between a person and a computer using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language people use daily, so it has a close relation with the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
In the related technology, generally, emotion analysis is directly performed on word segmentation words in a target text to be analyzed to obtain emotion degrees corresponding to the word segmentation words, and then emotion tendencies of the target text are determined according to the emotion degrees corresponding to different word segmentation words. However, only the word segmentation vocabulary in the target text is analyzed, so that the influence of other vocabularies except the word segmentation vocabulary in the target text on the word segmentation vocabulary is easily ignored, or the content information in the sentence in the target text is ignored, and the emotion analysis accuracy of the target text is low.
In the embodiment of the application, the text emotion analysis method is provided, so that the factors considered when the emotion is carried out on the target text are more comprehensive, and the analysis result is more accurate. The analysis method for the text emotion obtained by training comprises at least one of the following scenes when applied.
1. The method is applied to electronic commerce.
Electronic commerce is a demand for users to perform various business activities and financial activities on the internet by taking the internet as a medium. Electronic commerce is required not only to create a good transaction environment but also to find a way to satisfy the user's needs as much as possible. Illustratively, the evaluation of a certain commodity when a user purchases online can be used as a target text of emotion analysis, the target text can be automatically analyzed to belong to 'good evaluation', 'medium evaluation' or 'bad evaluation' based on the processes of word segmentation, adjustment, emotion weight obtaining and emotion information determination of the target text, the evaluation of stores can be generated based on the evaluation, and the stores can be stimulated to improve services of the stores and recommend related demand stores to the user.
2. The method is applied to intelligent traffic evaluation.
The method includes that traffic service providing software is installed in a vehicle-mounted terminal, when a user drives a vehicle to a certain position, the vehicle-mounted terminal prompts the user to evaluate auxiliary driving, the user can select to click a display interface of the vehicle-mounted terminal to evaluate or evaluate in a voice mode, when feedback information of the user is a text, the vehicle-mounted terminal conducts emotion analysis on the text, and then emotion analysis results of the text are obtained. For example: the user evaluates the driving in a voice input mode, the evaluation content is 'good and comfortable', emotion analysis is carried out on the evaluation content based on traffic evaluation software installed in the vehicle-mounted terminal, a forward emotion analysis result is obtained, and optionally the analysis result can be stored on the vehicle-mounted terminal.
It should be noted that the above application scenarios are only illustrative examples, and the method for text emotion analysis provided in this embodiment may also be applied to other scenarios, which are not limited in this embodiment.
Next, an implementation environment related to the embodiment of the present application is described, and referring to fig. 1, the implementation environment relates to a terminal 110, a server 120, where the terminal 110 and the server 120 are connected through a communication network 130, and the server 120 further includes an emotion analysis model 140.
In some embodiments, the terminal 110 is configured to send the target text to the server 120. Illustratively, the server 120 has a word segmentation processing function, a character extraction function, and a comparison function.
The server 120 includes an emotion analysis model 140, and after performing emotion analysis on the target text through the emotion analysis model 140, outputs an analysis result of the emotion of the target text, and feeds back the analysis result of the emotion of the target text to the terminal 110 for display.
Firstly, the terminal 110 obtains a target text and sends the target text to the server 120, the emotion analysis model 140 in the server 120 performs word segmentation processing on the target text to obtain word segmentation words in the target text, then, the emotion analysis model 140 obtains modified words in the target text, the distance between the modified words and the word segmentation words is within a character number threshold range, the emotion degree corresponding to the word segmentation words is determined according to the modified words and the matching relation between the word segmentation words and the emotion word library, and finally, the emotion analysis model 140 obtains the emotion analysis result of the target text based on the emotion degree corresponding to the word segmentation words in the target text. Optionally, the server 120 sends the obtained analysis result of the text emotion to the terminal 110, and the terminal 110 displays the analysis result of the target text emotion.
It should be noted that the above terminals include but are not limited to mobile terminals such as mobile phones, tablet computers, portable laptop computers, intelligent voice interaction devices, intelligent appliances, and vehicle-mounted terminals, and can also be implemented as desktop computers; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data, an artificial intelligence platform, and the like.
The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, application programs, networks and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data of different levels can be processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
In some embodiments, the servers described above may also be implemented as nodes in a blockchain system. The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The block chain, which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The method for analyzing text emotion provided by the present application is described with reference to the noun introduction and the application scenario, and for example, the method is applied to a terminal, as shown in fig. 2, and the method includes the following steps.
Step 210, obtaining a target text.
The target text is a text to be subjected to emotion analysis.
The text is a file type composed of at least one of characters, punctuation marks, pictures, tables, and the like. Alternatively, text includes many forms of presence, such as: pdf form, mobi form, doc form, etc., and text also includes numerous categories such as: academic paper text, prose text, web review text, and the like.
The text with emotion means that characters, words or sentences in the text have certain emotional tendency, such as: positive, happy, prosperous, negative, flat, etc., by reading the text, the reader is made aware of the emotional tendency expressed by the text. Alternatively, the emotional tendencies of the text can be summarized simply as positive (e.g., happy, praised), negative (e.g., pessimistic), and neutral (e.g., flat).
The target text is a text with certain emotional tendency, and the emotional expression of the target text can be known by analyzing the content of the target text. Illustratively, words, sentences and the like in the target text are analyzed, so that the emotional tendency of the target text can be obtained, and the emotional tendency of the target text can be different for different words, sentences and the like in the target text. Emotion analysis is performed on a text, and the method is an application of a natural language processing method. Optionally, the analysis process for text emotion can be realized by analyzing, processing, generalizing and reasoning on sentence components in the text with emotional colors.
Schematically, a target text is a piece of subjective online comment of a catering platform, and the comment content is that the dish made by restaurant A is particularly delicious, wherein the word of delicious has positive meaning, so that the comment can be preliminarily evaluated as a comment with positive emotional tendency; or the text is a subjective network comment of the social platform, the comment content is 'D' which is made by C 'and is in short consideration', wherein the word 'lack' has negative meaning, so that the comment can be preliminarily evaluated as a comment with negative emotional tendency.
Optionally, the obtaining of the target text at least comprises the following modes:
1. randomly collected from the network.
Illustratively, when the evaluation condition of a restaurant is judged, 10 comments can be randomly selected from the comment area, and the extracted comments are used as the target text, so as to realize the acquisition process of the target text.
2. With purposeful retrieval from a text library.
Illustratively, a user enters text in a search engine to find text content with a particular emotional tendency, such as: the user inputs a 'inspiring story' in a search engine, based on the input of the user, the search engine can search text content related to the 'inspiring' in a text library, the searched text content is the target text in the application, the target text is extracted from the text library which can be acquired by the search engine, and the acquisition process of the target text is realized.
In an alternative embodiment, the target text needs to be acquired, but the acquired text may be advertisement information, or the acquired target text may have situations such as sentence obstruction, spam content mixing, and the like. Optionally, to more accurately implement the analysis of the target text, at least one of the following ways is included in obtaining the target text:
1. and carrying out noise filtration on the text to obtain a target text.
Acquiring a text set, wherein the text set is a set of text contents acquired from a preset database; and carrying out noise filtration on the text set, and acquiring a target text from the text set after the noise filtration.
The database is in a collection form of at least one kind of data, and the database can store data in the forms of numbers, symbols and the like, and can also store data in the forms of words, texts and the like. Illustratively, at least one text is stored in the preset database, and the at least one text is acquired from the preset database to form a text set, where the text set may be one magazine, two composition sets, or five chapters in one novel and the like.
The noise filtering is a way of filtering out interference factors in texts and extracting high-quality text contents. Optionally, when the text set is a magazine, the digital text information, the emoticon information, the advertisement text content, and the like in the magazine may affect the analysis of the text emotion, so that when the text set is obtained, the text set may be subjected to noise filtering.
2. And performing language judgment and post-processing on the text to obtain a target text.
When the target text is obtained, the obtained text may be a chinese text or a text in other languages, and illustratively, for texts in different languages, the processing manner of the target text may also be different. Such as: 1. when the acquired text is a Chinese text, directly taking the Chinese text as a target text; 2. when the obtained text is an English text, translating the English text to obtain a Chinese translation, and taking the Chinese translation as a target text; 3. when the acquired text is an English text, the language of the text is not converted, and the original English text is directly used as a target text. The above description is illustrative, and the present application is not limited to the above description.
Step 220, performing word segmentation processing on the target text to obtain word segmentation words in the target text.
The word segmentation processing is to process the characters in the target text, so that at least one character forms a word segmentation vocabulary form with a certain meaning. In addition to punctuation marks, characters in the target text are often closely connected and lack obvious word boundaries, so that it is difficult to directly identify keywords in the target text by reading the target text.
If the target text is a Chinese text, a single Chinese character is used as the most basic semantic unit, although most of the Chinese characters have own meanings, the ideographic capability is poor, the meanings are scattered, the ideographic capability of the vocabulary is stronger, and objects can be described more accurately. Therefore, the characters in the target text can be subjected to word segmentation by adopting a character segmentation method, namely, at least one character is combined into word segmentation words according to the appearance sequence of the characters in the target text. For example: and presetting a composition mode that two characters are a word segmentation vocabulary, and performing word segmentation processing on the characters in the target text.
Alternatively, considering that some adjacent characters cannot constitute a word having a meaning when the word segmentation process is performed, for example, "nearby house" is a word segmentation process, where "nearby" and "house" are meaning word segmentation words, and when "of a character" is alone close to "the character or" house "constitutes a word segmentation word of word segmentation word-" close "or" house ", an ambiguous situation is easily generated. Illustratively, when performing the word segmentation process, the characters may be input into a trained Model according to the sequence appearing in the target text for performing the word segmentation process, such as a dictionary word segmentation Model, a Hidden Markov Model (HMM), a Long-Short Term Memory Model (LSTM), and the like, so as to obtain a word segmentation vocabulary with a clearer vocabulary meaning.
Optionally, the character division method may be combined with a model processing method, and the ideographic clarity of the word-segmented vocabulary is ensured as much as possible on the basis of performing granularity division on the word-segmented vocabulary. In addition, in order to distinguish characters from word segmentation words, the word segmentation words formed by the characters can be marked by adopting a labeling mode such as adding colors and shading.
Step 230, obtaining a modified vocabulary in the target text, wherein the distance between the modified vocabulary and the participle vocabulary is within the character number threshold range.
Wherein, the modification vocabulary is used for modifying the word segmentation vocabulary.
The characters in the target text may be distinguished by the language of the target text, such as: when the target text is a Chinese text, the characters in the target text are Chinese characters; when the target text is an english text, the characters in the target text are english letters and the like. The number of characters is schematically the number of characters, and when the distance from the word segmentation word is judged, the distance is expressed by the characters. For example, "a character having a distance of 4 from the vocabulary to be participated includes two characters", which are the 5 th character before the first character of the vocabulary to be participated and the 5 th character after the last character of the vocabulary to be participated, such as: "i forget to bring an identity card when traveling today", wherein, for the participle word "traveling", the distances between the characters "i" and "share" and the participle word are 4 characters.
Optionally, the participle word may be a noun, or a verb, an adjective, etc., and the modified word is used to modify the participle word, because the part of speech of the participle word is different, the modified word may also be different, for example: when the word segmentation vocabulary is a noun, the modified vocabulary is an adjective; when the word segmentation vocabulary is a verb, modifying the vocabulary into degree adverbs; when the word segmentation words are adjectives, the modified words are degree adverbs, and the like. Wherein the number of modified words includes at least the following: 1. the number of the modified vocabulary is 0, namely, the modified vocabulary of the modified participle vocabulary does not exist; 2. the number of the modified vocabularies is 1, namely, a modified vocabulary for modifying the word segmentation vocabulary exists; 3. the number of modified words is larger than 1, namely, a plurality of modified words of modified participles exist. According to the number of the modified words, the analysis of the target text has certain difference.
Alternatively, the modified vocabulary is usually located near the participle vocabulary, and the modified vocabulary can be obtained by extracting near the participle vocabulary, wherein the nearby range is too general. Illustratively, when acquiring the modified vocabulary, acquiring the modified vocabulary with the distance from the participle vocabulary within a character number threshold range, wherein the existence of the character number threshold at least comprises the following modes:
1. only the threshold of the number of characters is set in advance.
Illustratively, only within 10 characters of the preset character number threshold value, that is, when the distance between the target text and the participle is judged, only the characters within 10 character ranges of the distance between the target text and the participle are judged, that is, only whether the modified vocabulary for modifying the participle exists within the first 10 characters or the last 10 characters is judged.
2. And setting the relative position relation between the characters and the word segmentation vocabularies when setting the threshold value of the number of the characters.
Illustratively, when setting the threshold value of the number of characters, not only the number of characters but also the relative position relationship between the characters and the word segmentation vocabulary are set, such as: and setting a character number threshold value as 5 characters, wherein the 5 characters are positioned in front of the word segmentation vocabulary, and only judging whether a modified vocabulary for modifying the word segmentation vocabulary exists in the front 5 characters of the word segmentation vocabulary when judging the distance between the target text and the word segmentation vocabulary.
3. Setting a character number threshold range, wherein the specific character number threshold varies according to the number of characters in the target text.
Illustratively, when setting the threshold value of the number of characters, a specific numerical value is not preset, but a certain range is set, and when analyzing the target text, the specific numerical value may be determined from the range according to the number of characters or the number of word-segmentation words in the target text, or the specific numerical value of the threshold value of the number of characters may be adjusted according to the number of characters or the number of word-segmentation words in the target text. Such as: when the character number threshold is set, the character number threshold is 5 characters when the number of characters in the target text does not exceed 1000; when the number of the characters does not exceed 1000 or the number of the participle vocabularies exceeds 500, the threshold value of the number of the characters is 10 characters, and when the number of the detected target text characters is 800, modified vocabularies, the distances between the modified vocabularies and the participle vocabularies in the target text are obtained within the range of 5 characters; when the number of characters in the target text is increased and reaches 1000, adjusting the threshold value of the number of characters to be 10 characters, and acquiring a modified vocabulary in the target text, wherein the distance between the modified vocabulary and the word segmentation vocabulary is within the range of 10 characters.
And 240, determining the emotion degree corresponding to the word segmentation vocabulary based on the matching relation between the word segmentation vocabulary and the emotion vocabulary library and the modified vocabulary.
Wherein, the emotion vocabulary library comprises emotion vocabularies marked with basic emotion degrees.
Optionally, the basic emotional degree refers to the emotional degree directly embodied by the emotional vocabulary, at least one emotional vocabulary in the emotional vocabulary library corresponds to the basic emotional degree, and the basic emotional degree can comprise emotional tendency (such as positive, negative, neutral, etc.) and also comprise the size of the emotional tendency (such as particularly positive, very positive, more positive, etc.).
In an alternative embodiment, the emotion vocabulary in the emotion vocabulary library comprises an active vocabulary, a passive vocabulary and a neutral vocabulary; wherein, the basic emotional degree of the active vocabulary is positive number; the basic emotion degree of the negative vocabulary is negative; the base emotional degree of the neutral vocabulary was 0. For example: one emotion vocabulary in the emotion vocabulary library is praise, the basic emotion degree of the emotion vocabulary in the emotion vocabulary library is 0.4410439, namely the emotion vocabulary is proved to be a positive vocabulary, and the emotional tendency is large; or, one emotion vocabulary in the emotion vocabulary library is 'chill', the basic emotion degree of the emotion vocabulary in the emotion vocabulary library is-0.61774685, namely, the emotion is proved to be a negative vocabulary, and the emotional tendency is very large.
Illustratively, the source of the library of emotional words includes at least the following:
1. the emotion vocabulary library is preset.
Illustratively, the emotion vocabulary library is a vocabulary library which is obtained by integrating emotion expressions of a plurality of vocabulary libraries.
2. The emotion vocabulary library is obtained by training in different modes according to different types of target texts.
Illustratively, the emotional degree expressed when the target text is academic paper content is different from the emotional degree expressed when the target text is web review content in a high probability. When the target text is an academic paper, acquiring paper texts as many as possible, and performing word segmentation, marking, training through a text training model and the like on the paper texts to obtain a paper emotion vocabulary library of the paper texts; when the target text is the web text, acquiring the web text as many as possible, and performing word segmentation, marking, training through a text training model and the like on the web text to obtain a web emotion vocabulary library of the web text. According to the type of the target text needing to be analyzed, determining a selected emotion vocabulary library, such as: and when the target text to be analyzed is the web text, taking the web emotion vocabulary library as the emotion vocabulary library of the analysis target text.
It is to be noted that the above methods may be used not only alone but also in combination. The above description is merely exemplary, and the present disclosure is not limited thereto.
Matching the word segmentation words with the emotion word library based on the obtained emotion word library, wherein the matching mode at least comprises the following steps:
1. a character matching method is used.
Schematically, character matching is directly carried out on the word segmentation vocabularies and the emotion vocabularies in the emotion vocabulary library, and when the characters of one word segmentation vocabulary are matched with the characters of one emotion vocabulary in the emotion vocabulary library one by one, the word segmentation vocabulary is proved to be matched with the emotion vocabularies.
2. A vector matching method is adopted.
Illustratively, a vector matching mode is adopted when the word segmentation vocabularies are matched with the emotion vocabulary library, and when the similarity between the word segmentation vocabularies and the emotion vocabularies in the emotion vocabulary library reaches a similarity threshold value, the word segmentation vocabularies are determined to be matched with the emotion vocabularies.
For example: as one Of the language models, a Word to Vector (Word 2 Vector) model is adopted, which can generate Word segmentation vectors based on Word segmentation vocabulary, and the Word2 Vector has two training models, namely a Continuous Bag Of Words model (CBOW) and a Skip-Gram model. The CBOW model is used for predicting the current word through the context, the Skip-Gram model is used for predicting the context through the current word, and optionally, the CBOW model is used for obtaining the word segmentation vocabulary and vector representation in the emotion vocabulary library.
The CBOW model comprises an input layer, a hidden layer and an output layer, wherein the input layer consists of word segmentation words { x 1 ,…x i ,…,x C A label is marked on each word segmentation word, the number of the word segmentation words is C, the number of the emotion words in the emotion word list is V, and i is the ith word segmentation word; the hidden layer is an N-dimensional vector; the output layer is a participle vector and an emotion vector corresponding to the participle vocabulary and the emotion vocabulary. The input vector is connected to the hidden layer through a weight matrix W with V multiplied by N dimension; the hidden layer is connected to the output layer by an N × V weight matrix W'.
It is assumed that the size of the input and output weight matrices is known.
The first step is to compute the output of the hidden layer h. The following were used:
Figure BDA0003448992100000121
where WI is the weight and the output h is the weighted average of the input vectors.
The second step is to compute the input u at each node of the output layer. The following were used:
u j =O j *h
in which WO j Is the j-th column, u, of the output matrix WO j Is the jth column input.
Finally, the output of the output layer is calculated, and y is output j The following were used:
Figure BDA0003448992100000131
softmax may be selected as the activation function when computing the output of the output layer. And finally, updating the weight of the model by calculating a cross entropy loss function to obtain a more accurate vocabulary vector.
Optionally, through training of Word2Vec on the input text, each participle Word in the text and each emotion Word in the emotion Word library are mapped into a low-dimensional dense vector with a fixed dimension, and by comparing distances between the Word vector corresponding to the participle Word and the emotion vector corresponding to the emotion Word, the emotion similarity between the participle Word and the emotion Word can be known.
After the word segmentation words are matched with the emotion word library, the corresponding relation between the word segmentation words and the emotion words in the emotion word library can be obtained. The target text comprises a word segmentation word of 'happy', the emotion word library also stores emotion words of 'happy', therefore, a matching relation of the word segmentation word and the emotion words can be established, and emotion information corresponding to the 'happy' in the emotion word library can also be applied to the word segmentation word of the target text. Alternatively, when the segmented word is preceded by a modified word that modifies the segmented word, information contained in the modified word may affect emotional information of the segmented word. Such as: although the word segmentation word is the same as the word segmentation word in the emotion word library, the emotion degree corresponding to the word segmentation word is adjusted because the word segmentation word is not modified before the word segmentation word.
And step 250, determining an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation words in the target text.
Optionally, each word segmentation vocabulary in the target text is matched with the emotion vocabulary in the emotion vocabulary library and is adjusted by the modified vocabulary, so that the corresponding emotion degree is obtained. The word segmentation vocabulary after emotion analysis not only considers the original meaning of the vocabulary, but also fuses the meaning of the target text context under the adjustment of the modified vocabulary. Therefore, by analyzing the emotion program of the segmented word obtained by analysis, the emotion analysis result corresponding to the target text can be determined.
Illustratively, the basic emotion degree of the emotion vocabulary in the emotion vocabulary library is represented by numbers, and the emotion degree obtained based on the basic emotion degree is also represented by numbers. In addition, the word segmentation words of the target text have certain relevance, and the analysis result of the target text emotion can be obtained through calculation of numbers.
In an optional embodiment, performing summation operation on the emotion degrees of word segmentation vocabularies in a target text to obtain an emotion degree score corresponding to the target text; and determining the emotion analysis result corresponding to the target text based on the emotion degree score corresponding to the target text.
The emotion degree score is obtained by calculating the emotion degree score because the target text consists of at least one word segmentation vocabulary, and the emotion degree corresponding to the word segmentation vocabulary can reflect the emotion degree of the target text; the emotion analysis result is determined based on the obtained emotion degree score. Optionally, the emotion analysis result adopts the same judgment standard as the emotion vocabulary library: when the emotion degree score is positive, the emotion analysis result is a positive result; when the emotion degree score is negative, the emotion analysis result is a negative result; when the emotion degree score is 0, the emotion analysis result is a neutral result.
Schematically, the target text is a comment of a social platform, 5 participle vocabularies are obtained after participle processing, the participle vocabularies are adjusted through the matching emotion vocabulary library and the modification vocabularies to obtain 5 emotion degrees corresponding to the participle vocabularies one by one, and the emotion degrees are sequentially: 0.52, 0.18, 0, -0.32 and-0.12, namely, the five participle vocabularies are an active vocabulary, a neutral vocabulary, a passive vocabulary and a passive vocabulary in turn, and then the degrees of the 5 participle vocabularies are added in turn: 0.52+0.18+0+ (-0.32) + (-0.12), and the corresponding emotion degree score of the target text is 0.26, and is a positive number based on 0.26, so that the emotion analysis result is determined according to the emotion degree score, namely the target text is a positive comment.
In summary, in the method provided in this embodiment, first, a word segmentation process is performed on an obtained target text to obtain a word segmentation word, then a modified word is searched in a range having a certain distance from the word segmentation word to a certain character, and because the modified word is used for modifying the word segmentation word, the basic emotion degree of the word segmentation word is affected, then, the word segmentation word is analyzed in combination with the modified word, so that a more accurate emotion degree corresponding to the word segmentation word can be obtained, and finally, an emotion analysis result of the target text is determined based on the emotion degrees corresponding to a plurality of word segmentation words in the target text. Through the method, the word segmentation vocabularies and the modified vocabularies within a certain character number threshold range are combined and analyzed, and further the analysis result of the target text emotion is determined, so that when the target text is subjected to emotion analysis, the range of the modified vocabularies is more determined, the considered factors are more comprehensive, and the analysis result of the target text is more accurate.
In an alternative embodiment, the modified vocabulary has a very important influence on the emotional degree corresponding to the segmented vocabulary, and the different modified vocabularies can also result in different emotional degree analysis methods. Illustratively, as shown in fig. 3, step 240 in the embodiment shown in fig. 2 can also be implemented as the following steps 310 to 320.
And 310, responding to the matching relation between the word segmentation vocabularies and the emotion vocabularies in the emotion vocabulary library, and acquiring the basic emotion degree corresponding to the emotion vocabularies.
Illustratively, the emotion vocabulary library adopts an existing emotion dictionary, one emotion dictionary is selected as the emotion vocabulary library by comparing the characteristics of various existing emotion dictionaries, 28803 emotion vocabularies are collectively collected in the emotion vocabulary library, the total amount is rich, each emotion vocabulary has a corresponding emotion degree, and the requirement of analyzing the target text is met. Illustratively, the emotional degree of the positive emotional words in the emotional vocabulary library is greater than 0, the emotional degree of the negative emotional words is less than 0, the emotional degree of the neutral words is 0, the absolute value of the emotional degree is larger, the greater the emotional tendency degree is indicated, the emotional words and the emotional degree are separated by commas, and the examples of the emotional vocabulary library are as follows: nausea, -0.267935; happy, 0.681309; camouflage, -0.5.
In an alternative embodiment, the matching relationship between the word segmentation vocabulary and the emotion vocabulary in the emotion vocabulary library at least comprises the following conditions:
1. and responding to the emotion vocabulary library including the emotion vocabulary same as the word segmentation vocabulary, and acquiring the basic emotion degree corresponding to the emotion vocabulary.
Illustratively, the word segmentation vocabularies of the target text are compared with the emotion vocabularies in the emotion vocabulary library, a large number of even all word segmentation vocabularies which are the same as the emotion vocabularies exist in the target text, the corresponding basic emotion degrees are all found on the basis of different emotion vocabularies, the basic emotion degrees corresponding to the emotion vocabularies which are the same as the word segmentation vocabularies are directly used as the basic emotion degrees of the word segmentation vocabularies, and the requirement for obtaining the basic emotion degrees is met.
2. And responding to the situation that the emotion vocabulary bank does not comprise the emotion vocabulary same as the participle vocabulary, acquiring a first candidate emotion vocabulary which has the highest similarity with the participle vocabulary and reaches a matching similarity threshold in the emotion vocabulary bank, and determining the basic emotion degree corresponding to the participle vocabulary based on the basic emotion degree of the first candidate emotion vocabulary and the similarity between the first candidate emotion vocabulary and the participle vocabulary.
Even if the emotion vocabulary library contains a plurality of emotion vocabularies, the emotion words contained in the emotion vocabulary library are still limited due to the diversification of natural language expression, and a complete and general emotion vocabulary library does not exist in the field of text emotion analysis. Illustratively, when the segmented word of the target text is compared with the emotion word in the emotion word library, it may be found that part of the segmented word different from the emotion word exists in the target text, that is, the emotion word same as the segmented word is not found in the emotion word library, and therefore, the basic emotion degree corresponding to the segmented word cannot be known.
Optionally, at this time, a word vector B may be obtained by performing vector operation on a participle word a which exists in the target text but cannot find the same emotion word in the emotion word library, and then, the word vector B may be compared with an emotion vector corresponding to the emotion word in the emotion word library to obtain a similarity, where a may be a representation of one participle word or a representation of a participle word set composed of a plurality of participle words. In an alternative embodiment, the comparison process for obtaining the similarity includes at least one of the following ways:
1. and comparing the word segmentation vocabulary A with the emotion vocabularies in the emotion vocabulary library one by one.
Illustratively, based on the word segmentation vocabulary A, a vocabulary vector B corresponding to the word segmentation vocabulary A is obtained, and the vocabulary vector B is compared with the emotion vector corresponding to each emotion vocabulary in the emotion vocabulary library, so that the similarity between the vocabulary vector B and the emotion vector corresponding to each emotion vocabulary is obtained.
2. And processing and comparing the word segmentation words and the emotion word library in the target text.
In consideration of the fact that the load on the terminal is large along with a large calculation amount in the process of comparing the vocabulary vector B with the emotion vector corresponding to each emotion vocabulary in the emotion vocabulary library, the word segmentation vocabulary and the emotion vocabulary library may be processed first in consideration of the fact that the emotion tendency of the target text is determined. Schematically, after the word segmentation vocabulary A is determined, the word segmentation vocabulary capable of finding the same emotion vocabulary in the emotion vocabulary library is extracted to be used as a set C, and then the vocabulary vector B corresponding to the word segmentation vocabulary A is compared with the similarity of the word segmentation vector corresponding to at least one word segmentation vocabulary in the set C.
3. And comparing the segmented word A with a specific plate in the emotion vocabulary library.
Illustratively, the emotional vocabulary library includes a plurality of blocks, each representing a different category, such as: the entertainment plate block contains a large number of entertainment vocabularies; the sports plate contains a large amount of emotion vocabularies related to sports; the food plate contains a large amount of emotional vocabularies and the like related to food evaluation. When the similarity comparison is carried out on the word segmentation vocabulary A and the emotion vocabulary in the emotion vocabulary library, the category of the target text or the word segmentation vocabulary is determined, such as: the entertainment category, the sports category, the food category and the like, and then the participle vocabulary A is compared with the specific category plate in the emotion vocabulary library, so that the similarity between the vocabulary vector B corresponding to the participle vocabulary A and the emotion vector corresponding to each emotion vocabulary is obtained.
The above description is only exemplary, and the present invention is not limited to the above description.
Through comparison, the basic emotion degree corresponding to the emotion vocabulary with the highest similarity to the word segmentation vocabulary and reaching the matching similarity threshold in the emotion vocabulary library can be used as the basic emotion degree corresponding to the word segmentation vocabulary; or firstly, the emotion vocabulary with the highest similarity with the word segmentation vocabulary and reaching the matching similarity threshold in the emotion vocabulary library is taken as a first candidate emotion vocabulary, and the basic emotion degree corresponding to the word segmentation vocabulary is determined based on the operation result of the similarity and the basic emotion degree corresponding to the first candidate emotion vocabulary.
And step 320, adjusting the basic emotion degree based on the modified vocabulary to obtain the emotion degree corresponding to the segmented vocabulary.
Wherein the modified vocabulary includes at least one of negative words and degree adverbs.
Illustratively, when a modified word that modifies a participle word is a negative word, the emotion information presented in the sentence where the participle word is located is reversed, such as: when the word segmentation word is an active word, when a negative word 'not' exists in front of the active word, the emotional information presented by the sentence in which the active word is located can become a negative expression. Or, when the modified vocabulary modifying the participle vocabulary is a degree adverb, the level of the emotion information presented in the sentence where the participle vocabulary is located may change (e.g. the emotional expression is more intense, the emotional expression is less willing, etc.), such as: when the word segmentation word is an active word, when a degree adverb 'extraordinary' exists in front of the active word, the emotional information presented in the sentence where the active word is located is more strongly expressed. Therefore, after the modified vocabulary is determined, it is necessary to adjust the degree of base emotion expressed by the segmented vocabulary based on the difference in the properties of the modified vocabulary.
In an alternative embodiment, in response to the modified vocabulary including a negative word, a first adjustment is made to the base emotion level; and performing second adjustment corresponding to the degree adverb on the basic emotion degree in response to the fact that the degree adverb is included in the modified vocabulary.
Illustratively, the first adjustment to the base emotional degree includes at least one of:
1. and carrying out inverse number operation on the numerical value corresponding to the basic emotion degree.
Illustratively, when the modified vocabulary corresponding to the word segmentation vocabulary is a negative word, the opposite number conversion is carried out on the basic emotion degree obtained by comparing the word segmentation vocabulary with the emotion vocabulary library. For example: and when the modified vocabulary corresponding to the participle vocabulary is the negative word 'not', carrying out opposite number conversion on the basic emotion degrees corresponding to the participle vocabulary to obtain the adjusted emotion degree-0.5, wherein the-0.5 is the emotion degree corresponding to the participle vocabulary, and the opposite number conversion is a first adjusting method.
2. And setting 0 to the basic emotion degree.
Considering that the modified vocabulary corresponding to the participle vocabulary may include a plurality of words, when the modified vocabulary is a negative word and the number of the negative words is an even number other than 0, there may be a case that the analysis result is inaccurate due to double negation. In order to avoid the above-mentioned situation caused by directly taking the inverse number for the value corresponding to the basic emotion degree, another adjustment method may be adopted. Illustratively, when the modified vocabulary corresponding to the segmented vocabulary is a negative word, the basic emotion degree obtained by comparing the segmented vocabulary with the emotion vocabulary library is set to 0. For example: and when the modified vocabulary corresponding to the word segmentation vocabulary is a negative word 'NO', setting 0 to the basic emotion degree corresponding to the word segmentation vocabulary to obtain the adjusted emotion degree 0, wherein 0 is the emotion degree corresponding to the word segmentation vocabulary, and the setting 0 operation is another first adjustment method.
Illustratively, the second adjustment to the base emotional degree includes at least one of:
1. and doubling the numerical value corresponding to the basic emotion degree.
Illustratively, when the modified vocabulary corresponding to the participle vocabulary is a degree adverb, the basic emotion degree obtained by comparing the base emotion vocabulary with the emotion vocabulary library is multiplied by a numerical value. For example: the basic emotion degree corresponding to the word segmentation vocabulary is 0.26, when the modified vocabulary corresponding to the word segmentation vocabulary is a degree adverb, the basic emotion degree corresponding to the word segmentation vocabulary is subjected to numerical value doubling operation, so that the adjusted emotion degree is 0.52, the 0.52 is the emotion degree corresponding to the word segmentation vocabulary, and the numerical value doubling operation is a second adjustment method.
2. And carrying out numerical operation on the numerical value corresponding to the basic emotion degree.
Considering that the different degrees of adverb expression emotion intensity are different, the basic emotion degrees corresponding to the branch vocabulary words can be changed in different numerical values according to the different degrees of adverbs. Schematically, when the emotion intensity of the degree adverb corresponding to the word segmentation vocabulary is weaker, the basic emotion degree obtained by comparing the word segmentation vocabulary with the emotion vocabulary library is subjected to operation with a smaller numerical value; and when the emotional intensity of the degree adverb corresponding to the word segmentation vocabulary is stronger, carrying out operation with larger numerical value on the basic emotional degree obtained by comparing the basic emotional degree with the emotional vocabulary library, wherein the change of the strength, the weakness and the corresponding numerical value can be predetermined. For example: the emotion intensity corresponding table of the degree adverbs is defined, and the table lists operation numerical values corresponding to a plurality of degree adverbs. Based on that the basic emotion degree corresponding to the word segmentation vocabulary is 0.32, when the modified vocabulary before the word segmentation vocabulary is a degree adverb ' extraordinary ', adding 0.5 to the basic emotion degree corresponding to the word segmentation vocabulary to obtain the adjusted emotion degree of 0.82, wherein the 0.82 is the emotion degree corresponding to the word segmentation vocabulary when the modified vocabulary is the extraordinary '; when the modified vocabulary before the word segmentation vocabulary is the degree adverb ' comparative ', the numerical operation is another second adjustment method, namely, adding 0.2 to the basic emotional degree corresponding to the word segmentation vocabulary to obtain the adjusted emotional degree of 0.52, wherein the 0.52 is the emotional degree corresponding to the word segmentation vocabulary when the modified vocabulary is the comparative '.
The above description is merely exemplary, and the present disclosure is not limited thereto.
In an optional embodiment, in response to the word segmentation vocabulary corresponding to at least two modified vocabularies, the basic emotion degree is subjected to superposition adjustment based on the adjustment modes respectively corresponding to the at least two modified vocabularies.
Optionally, the modified vocabulary corresponding to the segmented vocabulary has different forms, which also affects the method for adjusting the basic emotion degree corresponding to the segmented vocabulary, and when two or more modified vocabularies exist corresponding to the segmented vocabulary, the method at least includes the following processing modes:
1. the word segmentation vocabulary corresponds to two or more modified vocabularies in the form of negative words.
Schematically, when two or two modified vocabularies corresponding to the word segmentation vocabularies are negative words, the basic emotion degree of the word segmentation vocabularies is subjected to first adjustment for the number of times corresponding to the number of the modified vocabularies, when the first adjustment is '0 setting operation', the basic emotion degree of the word segmentation vocabularies can be regarded as being subjected to '0 setting operation' only once, and the rest modified vocabularies do not influence the basic emotion degree of the word segmentation vocabularies; alternatively, when the first adjustment is "inverse number extracting operation", the number of times of the "inverse number extracting operation" may be determined based on the number of the negative words in the modified vocabulary, for example, when the number of the negative words in the modified vocabulary is a single number, the emotion degree corresponding to the segmented vocabulary obtained after the "inverse number extracting operation" is opposite to the basic emotion degree value, and when the number of the negative words in the modified vocabulary is an even number, the emotion degree corresponding to the segmented vocabulary obtained after the "inverse number extracting operation" is the same as the basic emotion degree value.
2. The word segmentation words correspond to two or more modified words with degree adverbs.
Illustratively, when two or two modified vocabularies corresponding to the segmented vocabularies are degree adverbs, the basic emotional degree of the segmented vocabularies is second adjusted, and when the second adjustment operation is a "numerical doubling operation", the numerical doubling operation "which is performed on the segmented vocabularies for the number of times corresponding to the number of the modified vocabularies may be performed, that is, the more the number of the modified vocabularies, the stronger the emotional degree corresponding to the segmented vocabularies.
3. Modified terms there are two or more modified terms that differ in form.
Illustratively, the word segmentation vocabulary has two or more than two modified vocabularies, the modified vocabularies comprise degree adverbs and also comprise negatives, and the word segmentation vocabulary is modified on different levels based on different modified vocabularies, so that the basic emotion degrees of the word segmentation vocabulary are influenced. Therefore, when the word segmentation vocabulary is adjusted, the basic emotion degree can be adjusted based on the adjustment modes respectively corresponding to different modified vocabularies. Such as: when the word segmentation vocabulary corresponds to two modified vocabularies which are respectively a degree adverb and a negative word, the word segmentation vocabulary is subjected to first adjustment based on the modified vocabularies including the negative word, and the word segmentation vocabulary is subjected to second adjustment based on the modified vocabularies including the degree adverb. Illustratively, when the first adjustment mode is "set 0 operation", the basic emotion degree corresponding to the word segmentation vocabulary may be directly set to 0, so as to obtain the emotion degree corresponding to the word segmentation vocabulary.
In summary, word segmentation processing is performed on an obtained target text to obtain word segmentation words, then modified words are searched in a range with a certain distance from the word segmentation words, the modified words are used for modifying the word segmentation words, so that basic emotion degrees of the word segmentation words can be affected, then the word segmentation words are analyzed in combination with the modified words, more accurate emotion degrees corresponding to the word segmentation words can be obtained, and finally, an emotion analysis result of the target text is determined based on the emotion degrees corresponding to a plurality of word segmentation words in the target text. Through the method, the word segmentation words and the modified words in a certain character number threshold range are combined and analyzed, and then the analysis result of the target text emotion is determined, so that when the target text is subjected to emotion analysis, the range of the modified words is more determined, the considered factors are more comprehensive, and the analysis result of the target text is more accurate.
In the method provided by this embodiment, considering the influence of the modified vocabulary corresponding to the segmented vocabulary on the basic emotion degree corresponding to the segmented vocabulary, the basic emotion degree of the segmented vocabulary is changed by different methods according to different modified vocabularies, such as: when the modified vocabulary is a negative word, the '0 setting operation' can be carried out on the basic emotion degree of the corresponding word segmentation vocabulary; or when the modified vocabulary is the degree adverb, the numerical value doubling operation can be carried out on the basic emotion degree of the corresponding participle vocabulary, and the accuracy of the result of emotion analysis on the target text can be improved in such a way.
In an alternative embodiment, although it is convenient to perform emotion analysis on the target text based on the existing emotion vocabulary library, it may happen that the emotion vocabulary library does not include individual word segmentation words in the target text. Therefore, before emotion analysis is carried out on the target text, or when the emotion vocabulary library is used daily, the emotion vocabulary library is updated, so that emotion analysis can be better carried out on the target text. The method for updating the emotion vocabulary library comprises a plurality of methods, and illustratively, the method for updating the emotion vocabulary library comprises the following steps: (1) direct comparison method: directly comparing the word segmentation vocabulary with the emotion vocabulary; (2) treatment comparison method: and processing the emotion vocabulary library and then comparing the word segmentation vocabulary with the emotion vocabulary. As shown in fig. 4, step 310 in the embodiment shown in fig. 3 further includes the following steps 410 to 450.
(1) Direct comparison method: directly comparing word segmentation vocabulary with emotion vocabulary
Step 410, responding to the mismatching of the participle words and the emotion words in the emotion word library, and determining a second candidate emotion word with the highest similarity with the participle words in the emotion word library.
The Word segmentation method includes the steps that a target text is segmented to obtain segmented words, the segmented words are matched with an emotion Word library, the situation that the emotion words in the emotion Word library cannot be matched with all the segmented words possibly exists, namely, words which are not included in the emotion Word library exist in the segmented words, similarity comparison can be conducted on the segmented words and the emotion Word library on the basis of the situation, a Word2Vec model can be adopted for similarity comparison, and low-dimensional Word vectors of all the segmented words in the target text are trained through the model. And then comparing the similarity of the vocabulary vector with the emotion vectors corresponding to all the emotion vocabularies in the emotion vocabulary library, wherein the closer the distance between the vocabulary vector and the emotion vector is, the higher the similarity between the vocabulary vector and the emotion vector is, so that although the participle vocabulary is different from the emotion vocabulary, the participle vocabulary most similar to the emotion vocabulary can be found as much as possible.
Wherein, the similarity can be calculated by calculating the cosine similarity between the participle vectors if the participle vector A = (A) 1 ,A 2 ,...,A n ),B=(B 1 ,B 2 ,...,B n ). The cosine similarity is then:
Figure BDA0003448992100000201
/>
in an alternative embodiment, only the similarity between the segmented words and the emotion words is considered, and there may be a case where the similarity is low and the emotion degree corresponding to the segmented words cannot be correctly predicted. Therefore, under the condition of considering the similarity, the condition of whether the similarity meets the condition is additionally considered, namely the second candidate emotional word which has the highest similarity with the participle words in the emotional word library and reaches the update similarity threshold is determined.
Illustratively, after the similarity between the participle words and a plurality of emotion words in the emotion word library is calculated according to the similarity, the similarity between each participle word and the emotion word with the highest similarity is determined. Then, the similarity is compared with a preset similarity threshold value in a numerical mode, and when the similarity reaches or exceeds the similarity threshold value, the emotion vocabulary is used as a second candidate emotion word; when the similarity does not reach the similarity threshold, the similarity can be selected to be recalculated, and the emotional influence of the emotional degree corresponding to the word segmentation vocabulary on the target text can also be selected not to be considered. For example: the similarity threshold value is preset to be 0.6, the number of the participle vocabularies which are not included in the emotion vocabulary library is 4, after the 4 participle vocabularies are compared with the emotion vocabularies in the emotion vocabulary library in terms of similarity, the highest similarity values are 0.92, 0.81, 0.62 and 0.57 in sequence, the 4 values are compared with the similarity threshold value 0.6 in terms of value, and the emotion vocabularies corresponding to the first three similarity values with the similarity exceeding the similarity threshold value 0.6 are used as second candidate emotion words.
And step 420, adding the participle words to an emotion word library based on the basic emotion degree of the second candidate emotion words, and updating the emotion word library.
Illustratively, when the second candidate emotion word is obtained, the similarity between the second candidate emotion word and the corresponding emotion vocabulary and the basic emotion degree of the emotion vocabulary can also be known. In order to update the emotion vocabulary library, the base emotion vocabulary of the second candidate emotion word needs to be known.
In an optional embodiment, the product of the similarity between the second candidate emotion word and the participle word and the basic emotion degree of the second candidate emotion word is determined as the basic emotion degree corresponding to the participle word; and adding the word segmentation words to an emotion word library based on the basic emotion degrees corresponding to the word segmentation words.
Illustratively, when determining the basic emotion degree of the second candidate emotion word, the similarity between the known second candidate emotion word and the participle word and the known basic emotion degree of the emotion word are used for multiplying the two. For example: the similarity between the second candidate emotion word and the participle word is 0.82, the base emotion degree of the emotion word corresponding to the second candidate emotion word is 0.4562, and the base emotion degree of the second candidate emotion word is the product of the two values, that is: 0.82 × 0.4562=0.374084, and 0.374084 is used as the basic emotion degree of the second candidate emotion word, and the second candidate emotion word and the corresponding basic emotion degree are added to the emotion vocabulary library as a group, so that the update operation of the emotion vocabulary library is realized.
(2) Treatment comparison method: after the emotional vocabulary library is processed, the participle vocabulary and the emotional vocabulary are compared
And step 430, determining a word segmentation vocabulary set matched with the emotion vocabulary library in the target text.
Illustratively, after the word segmentation vocabularies of the target text are matched with the emotion vocabularies in the emotion vocabulary library, a word segmentation vocabulary set which is the same as the emotion vocabularies is obtained, and the word segmentation vocabulary set comprises at least one word segmentation vocabulary which is the same as the emotion vocabularies. For example: the word segmentation vocabulary in the target text is a 1 、a 2 、a 3 、a 4 、a 5 Based on the word segmentation vocabulary and the emotion vocabulary libraryAfter comparing the emotion vocabularies, determining a 1 、a 2 、a 3 A is the words in the word segmentation word library and the emotion word library 1 、a 2 、a 3 As a collection of word-segmented words.
Step 440, responding to the mismatching of the participle vocabulary and the emotion vocabulary library, and determining a third candidate emotion word with the highest similarity with the participle vocabulary in the participle vocabulary set.
Illustratively, the segmented words in the target text are not completely included in the emotion vocabulary library, and when the segmented words in the target text are matched with the emotion vocabularies in the emotion vocabulary library, a difference vocabulary set different from the emotion vocabularies may be obtained, and the difference vocabulary set includes at least one segmented word not existing in the emotion vocabulary library. And then, carrying out similarity comparison on at least one participle word in the difference word set and the participle word set. For example: as mentioned above, the word segmentation vocabulary in the target text is a 1 、a 2 、a 3 、a 4 、a 5 Determining a based on the comparison of the word segmentation vocabulary and the emotion vocabulary library 1 、a 2 、a 3 Is a word-segmentation vocabulary set, and the emotion vocabulary library does not comprise the word-segmentation vocabulary a 4 、a 5 A is to 4 、a 5 As the difference vocabulary set, schematically, the participle vocabulary a in the difference set 4 、a 5 Obtaining word segmentation vector b after vector operation 4 、b 5 Meanwhile, each participle word a in the participle word set consisting of the participle words with the same participle words as the emotion words 1 、a 2 、a 3 After vector operation is carried out, a participle vector b corresponding to each participle word is obtained 1 、b 2 、b 3 The word segmentation vector b 4 、b 5 And each participle vector b in the participle vector set 1 、b 2 、b 3 And carrying out similarity comparison. Because the emotional tendency of the target text is relatively fixed, the similarity operation is carried out on the participle word set and the difference word set, the calculated amount can be reduced on the basis of the target text, and the difference word set is determinedAnd after the similarity between the difference vocabulary and each participle vocabulary in the participle vocabulary set, taking the participle vocabulary with the highest similarity as a third candidate emotional word.
And step 450, adding the participle words to the emotion word library based on the basic emotion degree of the third candidate emotion words, and updating the emotion word library.
After the third candidate emotion word is determined, the basic emotion words of the difference words can be determined based on the basic emotion degrees corresponding to the third candidate emotion word and the similarity between the difference words and the third candidate emotion word, and the difference words belong to the participle words, so that the participle words and the corresponding basic emotion degrees (namely, the difference words and the corresponding basic emotion words) which do not exist in the emotion word library are added into the emotion word library to update the emotion word library.
In summary, firstly, word segmentation processing is performed on an obtained target text to obtain word segmentation words, then modified words are searched within a range with a certain character distance from the word segmentation words, the modified words are used for modifying the word segmentation words, so that basic emotion degrees of the word segmentation words can be affected, then the word segmentation words are analyzed in combination with the modified words, more accurate emotion degrees corresponding to the word segmentation words can be obtained, and finally, an analysis result of target text emotion is determined based on the emotion degrees corresponding to a plurality of word segmentation words in the target text. Through the method, the word segmentation words and the modified words in a certain character number threshold range are combined and analyzed, and then the analysis result of the target text emotion is determined, so that when the target text is subjected to emotion analysis, the range of the modified words is more determined, the considered factors are more comprehensive, and the analysis result of the target text is more accurate.
In the method provided in this embodiment, it is considered that the emotion vocabulary in the existing emotion vocabulary library does not necessarily include all the word segmentation vocabularies in the target text, and therefore, the emotion vocabulary library may be updated first, for example: the method comprises the steps of calculating the similarity between word segmentation vocabularies and emotion vocabularies according to vectors between the word segmentation vocabularies and the emotion vocabularies, determining the basic emotion degree of the word segmentation vocabularies according to the numerical value corresponding to the emotion vocabularies with the highest similarity and the similarity, achieving the purpose of updating an emotion vocabulary library, and updating the emotion vocabulary library to enable the emotion vocabulary library to contain more emotion vocabularies, so that the method is beneficial to emotion word segmentation of a target text and can assist emotion analysis of similar texts.
Fig. 5 is a flowchart of a text sentiment analysis method according to an exemplary embodiment of the present application, taking as an example that the method is applied to a terminal to analyze comments in a piece of social software, as shown in fig. 5, the method includes the following steps.
Step 510, data acquisition.
Optionally, firstly, determining topics to be analyzed (such as entertainment topics, science and technology topics, sports topics and the like) from a plurality of fields divided by the social software, and specifying at least one target keyword and at least one target time period according to the topics to be analyzed, wherein the specified target keyword can assist in extracting target texts with the target keyword; the specified target time period can determine the text to be analyzed according to the time range, and the extraction range of the text to be analyzed is refined. Schematically, the theme to be analyzed is a scientific theme, namely a mobile phone brand; the target keyword comprises the name of the mobile phone and the model of the mobile phone; and analyzing the emotional expression of the mobile phone in the social software in the last month.
Illustratively, the data acquisition may be performed by calling an Application Programming Interface (API) 511 of the social software, and acquiring a text to be analyzed 513 containing a target keyword in the social software in the last month through a port 512.
Step 520, data preprocessing.
Optionally, after the text to be analyzed is acquired from the social software, text content interfering emotion analysis such as advertisements and spam information may exist in the text to be analyzed, illustratively, the interference text content in the text to be analyzed is filtered by adopting an advertisement and spam filtering technology 521, the high-quality text content to be analyzed is extracted and obtained as a target text 522, and the target text 522 is used as input data for emotion analysis, so that the emotion tendency and the emotion degree of the target text 522 in the social software are automatically analyzed in the following process.
Step 530, an emotion dictionary is created.
Illustratively, the emotion dictionary can be selected to be updated based on the existing emotion dictionary (basic emotion dictionary), and by comparing the respective characteristics of the existing emotion dictionaries, 28803 emotion vocabularies are found to be included in one emotion dictionary 531, the total amount of the emotion vocabularies is rich, the emotion vocabularies not only include the emotion tendencies of the emotion vocabularies, but also include the emotion tendency degree scores of the emotion vocabularies, the requirement of emotion analysis on the target text is met, and therefore the emotion dictionary 531 is selected as the basic emotion dictionary. In the emotion dictionary 531, the emotion degree score of the positive emotion vocabulary is greater than 0, the emotion degree score of the negative emotion vocabulary is less than 0, the emotion degree score of the neutral vocabulary is 0, the larger the absolute value of the score is, the larger the emotion tendency degree is, the emotion vocabulary and the emotion degree score are separated by commas, and the emotion dictionary is exemplified as follows:
nausea, -0.267935;
praise, 0.4410439;
cold warfare, -0.61774685;
satisfactory success, 0.2998128;
happy, 0.681309;
camouflage, -0.5;
because the emotion vocabulary in the basic emotion dictionary is limited, the expression of natural language is varied, a complete and general emotion dictionary does not exist in the text emotion analysis research field, and the problem of low accuracy can occur when the basic emotion dictionary is directly applied to the text emotion analysis of different field topics, the emotion vocabulary in the emotion dictionary can be automatically expanded for different field topics on the basis of the basic emotion dictionary. Illustratively, the emotion dictionary 532 is updated based on the Word2Vec model, generating an enhanced emotion dictionary 533.Word2Vec can map each Word in the corpus into a low-dimensional dense vector with fixed dimensionality through training of the input corpus, the similarity or distance between vectors reflects the semantic similarity or distance between words, words with higher similarity to words in the existing emotion dictionary in the existing corpus can be expanded into the emotion dictionary, and the enhanced emotion dictionary 533 is generated. The specific expansion scheme comprises the following steps:
1. performing Chinese Word segmentation on a text corpus obtained in a data preprocessing stage, using the segmented text corpus as input of a Word2Vec model, and training a low-dimensional vector of each vocabulary in the text corpus, wherein the text corpus is a text library of a target text, analyzing the target text comprises analyzing each text in the text corpus, or analyzing only part of texts in the text corpus, and the text which is not analyzed is only used for updating an emotion dictionary; alternatively, the emotion dictionary is updated only with the target text.
2. And finding out a word set which exists in both the text corpus and the emotion dictionary 531, finding out a plurality of words with the highest similarity with the word set aiming at each word segmentation in the set, and recording the similarity. The similarity calculation method adopts cosine similarity between word vectors.
3. Adding the participle vocabulary which does not appear in the basic emotion dictionary and has a similarity greater than 0.8 with a certain emotion vocabulary in the basic emotion dictionary to the basic emotion dictionary, and setting the score of the participle vocabulary as the score of the corresponding emotion vocabulary in the basic emotion dictionary multiplied by the similarity to obtain an enhanced emotion dictionary 533.
Illustratively, the basic emotion dictionary contains the emotion vocabulary and the emotion degree of "happy and happy 0.681309", the cosine similarity between "hi skin" and "happy" is calculated to be 0.83 by Word2Vec training and similarity of the text corpus, and "hi skin" is not in the basic emotion dictionary, the emotion degree score of "hi skin" is set to 0.309.83 =0.56548647, and "681 skin, 0.56548647" is added to the basic emotion dictionary, and the basic emotion dictionary is updated. The method is adopted for all the words in the text corpus to generate a final enhanced emotion dictionary 533, and the creation process of the emotion dictionary is completed.
And 540, analyzing the emotion degree.
Optionally, the segmented words in the target text have emotion words matched with the segmented words in an emotion dictionary, but the analysis result of the emotion of the target text is not only related to the emotion degrees of the segmented words, and considering that in the actual natural language text, the emotion words are often modified by negative words or degree adverbs to cause the emotion tendency or degree of the text to change, a shallow syntactic analysis structure can be adopted to analyze relatively simple components in some structures in the target text, so that the emotion influence of the segmented words on the target text and the structure influence in sentences in the target text are considered. Illustratively, the vocabulary which can influence the emotion degree of the participle vocabulary is analyzed by adopting a degree level word dictionary provided by HowNet.
Optionally, the vocabulary that can influence the emotional degree of the participle vocabulary is a modified vocabulary, and the modified vocabulary at least comprises a negative word and a degree adverb. The negative word is one of adverbs, is a word representing negative meaning, has unique grammatical meaning and influence in the text, and analysis shows that the emotion word modified by the negative word tends to change the emotion polarity of the emotion word. When a negative word modifies a positive emotion vocabulary, the originally expressed positive emotion becomes neutral emotion or negative emotion, and special treatment is performed for the situation. "degree" in the adverb of degrees means that a quantity is at a certain level in the corresponding hierarchical sequence, and is a hierarchical representation of the quantity. Because most of the information in the social software is published instantly, the social software has the characteristics of less text content and wide information content, and non-written writing brings a great amount of adverbs to limit or modify the expression of users in aspects of viewpoints, places, attitudes and the like. Such as "very", "super", "very", etc. If the above-mentioned degree adverb modification emotion vocabulary exists in the target text, the emotion intensity of the corresponding emotion vocabulary in the target text needs to be adjusted.
Illustratively, whether the emotion vocabulary in the target text is modified by a negative word or a degree adverb is automatically analyzed by adopting a shallow syntax analysis technology, the implementation method is to judge whether the 5 characters in front of the emotion vocabulary in the text contain the negative word or the degree adverb, if the negative word is contained, the emotion degree score of the emotion word in the text is not calculated, and if the negative word is contained, the emotion degree score of the emotion word in the text is improved by 1 time.
And combining the enhanced emotion dictionary with the shallow syntactic analysis technology to automatically analyze the emotion of each target text in the speech. And adding the emotion degrees corresponding to all emotion vocabularies in the emotion dictionary contained in the target text to obtain an accumulated value, namely an analysis result of the target text emotion, wherein the accumulated value comprises the emotion tendency and the emotion tendency degree of the target text. Illustratively, if the overall score is greater than 0, the target text expresses a forward emotion; if the total score is less than 0, the negative emotion of the target text expression is represented; if the total score is equal to 0, the target text expresses neutral emotion (is objective); or setting the value of the neutral emotion to be in a threshold range, if the overall score of the neutral emotion is between-0.5 and 0.5, namely when the overall score is greater than 0.5, expressing the forward emotion of the target text; if the total score is less than-0.5, the negative emotion is expressed by the target text; and if the overall score is between-0.5 and 0.5, expressing neutral emotion in the target text. The above description is merely exemplary, and the present application is not limited thereto.
Alternatively, the overall score is subjected to absolute value calculation, and the obtained absolute value size represents the emotional tendency degree size. For example: and the overall score of the Y target text is-1.8, the absolute value of the overall score of the Y target text is calculated to obtain the absolute value of 1.8, and the emotional tendency degree of the Y target text is 1.8.
In summary, in the method provided in this embodiment, first, a word segmentation process is performed on an obtained target text to obtain a word segmentation word, then a modified word is searched in a range having a certain distance from the word segmentation word to a certain character, and because the modified word is used for modifying the word segmentation word, the basic emotion degree of the word segmentation word is affected, then, the word segmentation word is analyzed in combination with the modified word, so that a more accurate emotion degree corresponding to the word segmentation word can be obtained, and finally, an emotion analysis result of the target text is determined based on the emotion degrees corresponding to a plurality of word segmentation words in the target text. Through the method, the word segmentation vocabularies and the modified vocabularies within a certain character number threshold range are combined and analyzed, and further the analysis result of the target text emotion is determined, so that when the target text is subjected to emotion analysis, the range of the modified vocabularies is more determined, the considered factors are more comprehensive, and the analysis result of the target text is more accurate.
Fig. 6 is a block diagram of a keyword recognition apparatus according to an exemplary embodiment of the present application, and as shown in fig. 7, the apparatus includes the following components:
the text acquisition module 610 is configured to acquire a target text, where the target text is a text to be subjected to emotion analysis;
a word segmentation processing module 620, configured to perform word segmentation processing on the target text to obtain word segmentation words in the target text;
a vocabulary acquiring module 630, configured to acquire a modified vocabulary in the target text, where a distance between the modified vocabulary and the participle vocabulary is within a character number threshold range, where the modified vocabulary is used to modify the participle vocabulary;
a degree determining module 640, configured to determine, based on the matching relationship between the word segmentation vocabulary and an emotion vocabulary library and the modified vocabulary, an emotion degree corresponding to the word segmentation vocabulary, where the emotion vocabulary library includes an emotion vocabulary labeled with a basic emotion degree;
and the result determining module 650 is configured to determine an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation vocabulary in the target text.
As shown in fig. 7, in an alternative embodiment, the degree determining module 640 includes:
the degree obtaining module 641 is configured to obtain a basic emotion degree corresponding to the emotion vocabulary in response to the matching relationship between the word segmentation vocabulary and the emotion vocabulary in the emotion vocabulary library;
and the degree adjusting module 642 is configured to adjust the basic emotion degree based on the modified vocabulary, so as to obtain an emotion degree corresponding to the segmented vocabulary.
In an alternative embodiment, the modified vocabulary includes at least one of negatives and degree adverbs;
the degree adjusting module 642 is configured to perform a first adjustment on the basic emotion degree in response to the negative word being included in the modified vocabulary; and responding to the modified vocabulary including the degree adverb, and performing second adjustment corresponding to the degree adverb on the basic emotion degree.
In an optional embodiment, the degree adjusting module 642 is further configured to, in response to that the word segmentation vocabulary corresponds to at least two modified vocabularies, perform an overlap adjustment on the basic emotion degree based on adjustment manners corresponding to the at least two modified vocabularies, respectively.
In an alternative embodiment, the degree obtaining module 641 is configured to obtain a basic emotion degree corresponding to the emotion vocabulary in response to the emotion vocabulary library including the same emotion vocabulary as the participle vocabulary; or responding to the emotion vocabulary library not including the emotion vocabulary same as the word segmentation vocabulary, acquiring a first candidate emotion vocabulary which has the highest similarity with the word segmentation vocabulary and reaches a matching similarity threshold in the emotion vocabulary library, and determining the basic emotion degree corresponding to the word segmentation vocabulary based on the basic emotion degree of the first candidate emotion vocabulary and the similarity between the first candidate emotion vocabulary and the word segmentation vocabulary.
In an optional embodiment, the apparatus further comprises:
an emotion word determination module 660, configured to determine a second candidate emotion word in the emotion word bank, which has the highest similarity with the participle words, in response to that the participle words do not match with the emotion words in the emotion word bank;
and the word bank updating module 670 is configured to add the word segmentation words to the emotion word bank based on the basic emotion degree of the second candidate emotion words, and update the emotion word bank.
In an alternative embodiment, the degree determining module 640 is further configured to determine a set of participle words in the target text that match the emotion vocabulary library; in response to the fact that the participle words are not matched with the emotion word library, determining a third candidate emotion word with the highest similarity to the participle words in the participle word set; and adding the word segmentation words to the emotion word library based on the basic emotion degree of the third candidate emotion words, and updating the emotion word library.
In an alternative embodiment, the lexicon updating module 670 is further configured to determine a product of the similarity between the second candidate word and the segmented word and the basic emotion degree of the second candidate emotion word as a basic emotion degree corresponding to the segmented word; and adding the word segmentation words to the emotion word library based on the corresponding basic emotion degrees of the word segmentation words.
In an optional embodiment, the emotion word determination module 660 is further configured to determine the second candidate emotion word in the emotion word library, which has the highest similarity to the segmented word and reaches an update similarity threshold.
In an alternative embodiment, the emotion vocabulary in the emotion vocabulary library comprises an active vocabulary, a passive vocabulary and a neutral vocabulary; wherein the basic emotion degree of the positive vocabulary is positive number; the basic emotion degree of the negative vocabulary is negative; the basic emotional degree of the neutral vocabulary is 0.
In an optional embodiment, the result determining module 650 is further configured to perform a summation operation on the emotion degrees of the word segments in the target text, so as to obtain an emotion degree score corresponding to the target text; and determining an emotion analysis result corresponding to the target text based on the emotion degree score corresponding to the target text.
In an optional embodiment, the text obtaining module 610 is further configured to obtain a text set, where the text set is a set of text contents obtained from a preset database; and carrying out noise filtration on the text set, and acquiring the target text from the text set after the noise filtration.
It should be noted that: the text emotion analyzing apparatus provided in the above embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the text emotion analysis device provided by the above embodiment and the text emotion analysis method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.
Fig. 8 shows a schematic structural diagram of a server according to an exemplary embodiment of the present application. The server 800 includes a Central Processing Unit (CPU) 801, a system Memory 804 including a Random Access Memory (RAM) 802 and a Read Only Memory (ROM) 803, and a system bus 805 connecting the system Memory 804 and the CPU 801. The server 800 also includes a mass storage device 806 for storing an operating system 813, application programs 814, and other program modules 815.
The mass storage device 806 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 806 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 806 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable Programmable Read-Only Memory (EPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 804 and mass storage device 806 as described above may be collectively referred to as memory.
According to various embodiments of the application, the server 800 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the server 800 may be connected to the network 812 through the network interface unit 811 coupled to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.
The memory also includes one or more programs, which are stored in the memory and configured to be executed by the CPU.
Embodiments of the present application further provide a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for analyzing text emotion provided by the above method embodiments.
Embodiments of the present application further provide a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored on the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method for analyzing text emotion provided by the above method embodiments.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to make the computer device execute the method for analyzing text emotion described in any of the above embodiments.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is intended only to illustrate the alternative embodiments of the present application, and should not be construed as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (16)

1. A method for analyzing text emotion, the method comprising:
acquiring a target text, wherein the target text is a text to be subjected to emotion analysis;
performing word segmentation processing on the target text to obtain word segmentation words in the target text;
acquiring a modified vocabulary of which the distance from the target text to the participle vocabulary is within a character number threshold range, wherein the modified vocabulary is used for modifying the participle vocabulary;
determining the emotion degree corresponding to the word segmentation words based on the matching relation between the word segmentation words and an emotion word library and the modified words, wherein the emotion word library comprises emotion words marked with basic emotion degrees;
and determining an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation words in the target text.
2. The method of claim 1, wherein the determining the emotional degree corresponding to the segmented words based on the matching relationship between the segmented words and the emotion word library and the modified words comprises:
responding to the matching relation between the word segmentation vocabulary and the emotion vocabulary in the emotion vocabulary library, and acquiring a basic emotion degree corresponding to the emotion vocabulary;
and adjusting the basic emotion degree based on the modified vocabulary to obtain the emotion degree corresponding to the word segmentation vocabulary.
3. The method of claim 2, wherein the modified vocabulary includes at least one of negative words and degree adverbs;
the adjusting the basic emotion degree based on the modified vocabulary comprises the following steps:
responding to the negative words included in the modified vocabulary, and performing first adjustment on the basic emotion degree;
and responding to the modified vocabulary including the degree adverb, and performing second adjustment corresponding to the degree adverb on the basic emotion degree.
4. The method of claim 3, further comprising:
and responding to at least two modified vocabularies corresponding to the word segmentation vocabularies, and performing superposition adjustment on the basic emotion degree based on adjustment modes corresponding to the at least two modified vocabularies respectively.
5. The method of claim 2, wherein the obtaining a base emotional degree corresponding to the emotion vocabulary in response to the matching relationship between the word segmentation vocabulary and the emotion vocabulary in the emotion vocabulary library comprises:
responding to the emotion vocabulary library which comprises the emotion vocabulary same as the word segmentation vocabulary, and acquiring the basic emotion degree corresponding to the emotion vocabulary;
alternatively, the first and second electrodes may be,
in response to the fact that the emotion vocabulary bank does not include the emotion vocabulary which is the same as the participle vocabulary, acquiring a first candidate emotion vocabulary which has the highest similarity with the participle vocabulary and reaches a matching similarity threshold value in the emotion vocabulary bank, and determining a basic emotion degree corresponding to the participle vocabulary based on the basic emotion degree of the first candidate emotion vocabulary and the similarity between the first candidate emotion vocabulary and the participle vocabulary.
6. The method of claim 2, further comprising:
in response to the word segmentation words not being matched with the emotion words in the emotion word library, determining second candidate emotion words in the emotion word library, wherein the second candidate emotion words have the highest similarity with the word segmentation words;
and adding the word segmentation words to the emotion word library based on the basic emotion degree of the second candidate emotion words, and updating the emotion word library.
7. The method of claim 2, further comprising:
determining a word segmentation vocabulary set matched with the emotion vocabulary library in the target text;
in response to the fact that the participle words are not matched with the emotion word library, determining a third candidate emotion word with the highest similarity to the participle words in the participle word set;
and adding the word segmentation words to the emotion word library based on the basic emotion degree of the third candidate emotion words, and updating the emotion word library.
8. The method of claim 6, wherein the adding the segmented words to the emotion vocabulary library based on the base emotion degree of the second candidate emotion word comprises:
determining the product of the similarity between the second candidate vocabulary and the participle vocabulary and the basic emotion degree of the second candidate emotion word as the basic emotion degree corresponding to the participle vocabulary;
and adding the word segmentation words to the emotion word library based on the corresponding basic emotion degrees of the word segmentation words.
9. The method of claim 6, wherein the determining a second candidate emotion word in the emotion vocabulary library, which has the highest similarity with the participle vocabulary, comprises:
and determining the second candidate emotional words in the emotional word library, which have the highest similarity with the word segmentation words and reach an update similarity threshold.
10. The method of any of claims 1 to 9, wherein the emotion vocabulary in the emotion vocabulary library comprises an active vocabulary, a passive vocabulary and a neutral vocabulary;
wherein, the basic emotion degree of the active vocabulary is positive number;
the basic emotion degree of the negative vocabulary is negative;
the basic emotional degree of the neutral vocabulary is 0.
11. The method according to any one of claims 1 to 9, wherein the determining of the emotion analysis result corresponding to the target text based on the emotion degree of the participle word in the target text comprises:
summing the emotion degrees of word segmentation words in the target text to obtain an emotion degree score corresponding to the target text;
and determining an emotion analysis result corresponding to the target text based on the emotion degree score corresponding to the target text.
12. The method according to any one of claims 1 to 9, wherein the obtaining the target text comprises:
acquiring a text set, wherein the text set is a set of text contents acquired from a preset database;
and carrying out noise filtration on the text set, and acquiring the target text from the text set after the noise filtration.
13. An apparatus for analysis of textual emotion, the apparatus comprising:
the text acquisition module is used for acquiring a target text, wherein the target text is a text to be subjected to emotion analysis;
the word segmentation processing module is used for carrying out word segmentation processing on the target text to obtain word segmentation words in the target text;
the vocabulary acquisition module is used for acquiring a modified vocabulary in the target text, wherein the distance between the modified vocabulary and the participle vocabulary is within a character number threshold range, and the modified vocabulary is used for modifying the participle vocabulary;
the degree determining module is used for determining the emotion degree corresponding to the word segmentation words based on the matching relation between the word segmentation words and an emotion word library and the modified words, wherein the emotion word library comprises emotion words marked with basic emotion degrees;
and the result determining module is used for determining an emotion analysis result corresponding to the target text based on the emotion degree of the word segmentation words in the target text.
14. A computer device comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for analyzing text emotion according to any of claims 1 to 12.
15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of textual emotion analysis as claimed in any of claims 1 to 12.
16. A computer program product comprising a computer program or instructions which, when executed by a processor, implement a method of text sentiment analysis according to any one of claims 1 to 12.
CN202111658765.0A 2021-09-29 2021-12-31 Text emotion analysis method, device, equipment, medium and program product Pending CN115878752A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111151765 2021-09-29
CN2021111517651 2021-09-29

Publications (1)

Publication Number Publication Date
CN115878752A true CN115878752A (en) 2023-03-31

Family

ID=85756861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111658765.0A Pending CN115878752A (en) 2021-09-29 2021-12-31 Text emotion analysis method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN115878752A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361472A (en) * 2023-05-02 2023-06-30 周维 Public opinion big data analysis system for social network comment hot events
CN117973946A (en) * 2024-03-29 2024-05-03 云南与同加科技有限公司 Teaching-oriented data processing method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361472A (en) * 2023-05-02 2023-06-30 周维 Public opinion big data analysis system for social network comment hot events
CN116361472B (en) * 2023-05-02 2024-05-03 脉讯在线(北京)信息技术有限公司 Method for analyzing public opinion big data of social network comment hot event
CN117973946A (en) * 2024-03-29 2024-05-03 云南与同加科技有限公司 Teaching-oriented data processing method and system

Similar Documents

Publication Publication Date Title
Sohangir et al. Big Data: Deep Learning for financial sentiment analysis
Kumar et al. Sentiment analysis of multimodal twitter data
Mishra et al. Analyzing machine learning enabled fake news detection techniques for diversified datasets
Lin et al. Lexical based automated teaching evaluation via students’ short reviews
Hirschberg et al. Advances in natural language processing
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
Montejo-Ráez et al. Ranked wordnet graph for sentiment polarity classification in twitter
Choong et al. Predicting judging-perceiving of Myers-Briggs Type Indicator (MBTI) in online social forum
Xu et al. Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts
Abdullah et al. Emotions extraction from Arabic tweets
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
Liu et al. A novel aspect-based sentiment analysis network model based on multilingual hierarchy in online social network
Anvar Shathik et al. A literature review on application of sentiment analysis using machine learning techniques
CN115878752A (en) Text emotion analysis method, device, equipment, medium and program product
Hamed et al. The importance of neutral class in sentiment analysis of Arabic tweets
Choudhary et al. Emotions are universal: Learning sentiment based representations of resource-poor languages using siamese networks
Nagra et al. Deep sentiments analysis for roman urdu dataset using faster recurrent convolutional neural network model
Wagle et al. Explainable ai for multimodal credibility analysis: Case study of online beauty health (mis)-information
Martınez-Cámara et al. Ensemble classifier for twitter sentiment analysis
Aydoğan et al. TRSAv1: a new benchmark dataset for classifying user reviews on Turkish e-commerce websites
Klochikhin et al. Text analysis
Keya et al. G-bert: an efficient method for identifying hate speech in Bengali texts on social media
Borras-Morell Data mining for pulsing the emotion on the web
Sarwar et al. AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model
Abdelhakim et al. Ar-PuFi: A short-text dataset to identify the offensive messages towards public figures in the Arabian community

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40084145

Country of ref document: HK