CN111552815B - Emotion corpus expansion method and device and computer readable storage medium - Google Patents

Emotion corpus expansion method and device and computer readable storage medium Download PDF

Info

Publication number
CN111552815B
CN111552815B CN202010248850.9A CN202010248850A CN111552815B CN 111552815 B CN111552815 B CN 111552815B CN 202010248850 A CN202010248850 A CN 202010248850A CN 111552815 B CN111552815 B CN 111552815B
Authority
CN
China
Prior art keywords
emotion
standard
corpus
words
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010248850.9A
Other languages
Chinese (zh)
Other versions
CN111552815A (en
Inventor
过弋
王志宏
尹心明
樊志杰
陈家明
王家辉
张重磊
蔡新玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Third Research Institute of the Ministry of Public Security
Original Assignee
East China University of Science and Technology
Third Research Institute of the Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology, Third Research Institute of the Ministry of Public Security filed Critical East China University of Science and Technology
Priority to CN202010248850.9A priority Critical patent/CN111552815B/en
Publication of CN111552815A publication Critical patent/CN111552815A/en
Application granted granted Critical
Publication of CN111552815B publication Critical patent/CN111552815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of databases and discloses an emotion corpus expansion method, an emotion corpus expansion device and a computer-readable storage medium. The method for expanding the emotion corpus comprises the following steps: obtaining a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words and standard emotion polarities and standard emotion categories which are stored corresponding to the standard emotion words; acquiring an expanded corpus according to the standard emotion words, and adding and storing the expanded corpus into a standard emotion corpus; calculating emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated manner; and acquiring emotion categories of the expanded corpus according to the standard emotion categories, and storing the emotion categories of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated mode. Compared with the prior art, the emotion corpus expansion method, the emotion corpus expansion device and the computer-readable storage medium provided by the embodiment of the application have the advantage of automatic expansion.

Description

Emotion corpus expansion method and device and computer readable storage medium
Technical Field
The present application relates to the field of databases, and in particular, to a method and apparatus for expanding an emotion corpus, and a computer readable storage medium.
Background
With the development of intelligent bionic technologies such as autonomous learning, research on the emotion expression of sentences is more and more advanced, and the emotion expression of sentences in the prior art is mostly carried out based on the emotion expression of words in the sentences. In the prior art, emotion of each word is generally classified and summarized by establishing an emotion word library.
However, the inventor of the present application discovers that the emotion word library in the prior art is mainly constructed and updated manually, that is, after the emotion word library is built, the emotion word library is updated continuously by staff, which is time-consuming and labor-consuming and cannot be automatically expanded.
Disclosure of Invention
The embodiment of the application aims to provide an expanding method and device of an emotion corpus and a computer readable storage medium, so that the emotion corpus can be automatically updated according to the existing content, and new emotion corpus can be automatically added.
In order to solve the technical problems, the embodiment of the application provides an expanding method of an emotion corpus, which comprises the following steps: the method comprises the steps of obtaining a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored corresponding to the standard emotion words; acquiring an expanded corpus according to the standard emotion words, and adding and storing the expanded corpus into the standard emotion corpus; calculating emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated manner; and acquiring emotion categories of the expanded corpus according to the standard emotion categories, and storing the emotion categories of the expanded corpus and the expanded corpus in association with each other to the standard emotion corpus.
The embodiment of the application also provides an expanding device of the emotion corpus, which comprises the following steps: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform an emotion corpus expansion method as described above.
The embodiment of the application also provides a computer readable storage medium which stores a computer program, wherein the computer program realizes the method for expanding the emotion corpus when being executed by a processor.
Compared with the prior art, the embodiment of the application can automatically acquire the expanded corpus according to the standard emotion words of the terminal of the standard emotion corpus after acquiring the preset standard emotion corpus, and store the newly added expanded prediction into the preset standard emotion corpus, and can also calculate the emotion polarity of the expanded corpus according to the standard emotion polarity corresponding to the standard emotion words in the standard emotion corpus and calculate the emotion category of the expanded corpus according to the standard emotion category corresponding to the standard emotion words in the standard emotion corpus, thereby completing the autonomous updating and expansion of the standard emotion corpus and reducing the manpower consumption.
In addition, the expanded corpus includes expanded words, and the acquiring the expanded corpus according to the standard emotion words specifically includes: obtaining words with word vector similarity greater than a first preset similarity with the standard emotion words as candidate words, and obtaining a plurality of candidate words; obtaining word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarity of each candidate word; acquiring the number of the candidate similarities which are larger than a second preset similarity in the plurality of the candidate similarities of each candidate word as the candidate number of each candidate word; and taking the candidate words with the number greater than a preset threshold as the expansion words.
In addition, the calculating the emotion polarity of the expanded corpus according to the standard emotion polarity specifically includes: acquiring word vector similarity of each standard emotion word and the expansion word as sampling similarity; acquiring a plurality of standard emotion words with the sampling similarity being greater than a third preset similarity as sampling standard emotion words; acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity; calculating the product of the sampling similarity corresponding to each sampling standard emotion word and the sampling standard emotion polarity; accumulating the products; if the accumulated result is positive, the emotion polarity of the expansion word is 1; and if the accumulated result is negative, the emotion polarity of the expansion word is-1.
In addition, the obtaining the emotion category of the expanded corpus according to the standard emotion category specifically includes: acquiring the standard emotion type corresponding to the sampling standard emotion word as a sampling standard emotion type; and taking the emotion type with the largest quantity in the sampling standard emotion types as the emotion type of the expansion word.
In addition, the expanded corpus includes expanded emoticons, and the method for obtaining the expanded corpus according to the standard emotion words specifically includes: acquiring a statement sample library, wherein the statement sample library comprises a plurality of statements; and acquiring the emoticons which appear in the same sentence together with the standard emotion words as the extended emoticons. The expansion prediction comprises the expansion of the expression symbol, and the expression symbol can be further added into the emotion corpus, so that the use scene of the emotion corpus is expanded.
In addition, the calculating the emotion polarity of the expanded corpus according to the standard emotion polarity specifically includes: acquiring a standard emotion word which appears in the same sentence together with the extended expression symbol as a sampling standard emotion word; acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity; and calculating the sum of the sampling standard emotion polarities as the emotion polarity of the extended expression symbol.
In addition, the obtaining the emotion category of the expanded corpus according to the standard emotion category specifically includes: according to the standard emotion type, emotion significance and emotion correlation of the extended emotion symbol are obtained, wherein the emotion significance is used for representing the strength of the extended emotion symbol for expressing different emotion types, and the emotion correlation is used for representing the capability of the extended emotion symbol for distinguishing different emotion types; and acquiring the emotion category of the extended emotion symbol according to the product of the emotion significance and the emotion relativity.
In addition, the acquiring the emotion significance and emotion correlation of the extended emoticon according to the standard emotion category specifically includes: acquiring the number of times that each extended expression symbol and the standard emotion word of each emotion category appear in the same sentence together, and acquiring the first co-occurrence times corresponding to each extended expression symbol and each emotion category; acquiring the number of times that all the extended emoticons and the standard emotion words of each emotion category appear in the same sentence together, and acquiring second co-occurrence times corresponding to each emotion category; acquiring the number of times that all the extended emoticons and the standard emotion words of all emotion categories appear in the same sentence together, and obtaining a third co-occurrence number; acquiring the number of times that each extended expression symbol and the standard emotion words of all emotion categories appear in the same sentence together, and obtaining a fourth co-occurrence number; and acquiring the emotion significance according to the first co-occurrence times and the second co-occurrence times, and acquiring the emotion correlation according to the third co-occurrence times and the fourth co-occurrence times.
Drawings
FIG. 1 is a flowchart of a method for expanding an emotion corpus according to a first embodiment of the present application;
FIG. 2 is a flowchart of obtaining expanded terms in the method for expanding an emotion corpus according to the first embodiment of the present application;
FIG. 3 is a flowchart of an expanded emotion symbol acquisition in an emotion corpus expansion method according to a first embodiment of the present application;
fig. 4 is a schematic structural diagram of an emotion corpus expanding device according to a second embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.
The first embodiment of the application relates to an emotion corpus expanding method. The specific flow is shown in fig. 1, and includes:
step S101: the method comprises the steps of obtaining a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored corresponding to the standard emotion words.
Specifically, in this embodiment, the standard emotion corpus is a pre-established emotion corpus, in which a plurality of standard emotion words, and standard emotion polarities and standard emotion categories stored corresponding to the standard emotion words are stored. The standard emotion words are words which are known to express emotion clearly, for example, happy and the like can express emotion of happy; "angry", "annoying", etc. may express the emotion "anger", etc. It should be noted that, in the prior art, there are multiple classification bases of different types and different sizes for the emotion expressions of the words, and the following table is a common Chinese word emotion expression table, in which the emotion expressions of the Chinese words are classified into 7 emotion major classes and 21 emotion minor classes. It should be understood that the following table is merely illustrative of one specific classification method in this embodiment, and is not limited to this, and in other embodiments of the present application, other classification methods may be used, and the flexible setting may be specifically performed according to actual needs.
In addition, in the standard emotion corpus, emotion polarities of the standard emotion words are stored correspondingly to each standard emotion word, and emotion categories of the standard emotion words are stored correspondingly to be used as standard emotion categories. For example, the standard emotion category of the standard emotion word "happy" is "happy", and emotion polarity is 1; the standard emotion type corresponding to the standard emotion word 'hurt' is 'sad', and emotion polarity is-1. Etc., not explicitly recited herein.
Step S102: and obtaining an expanded corpus according to the standard emotion words, and adding and storing the expanded corpus into the standard emotion corpus.
Specifically, in this embodiment, the expanded corpus includes at least one category of two of expanded terms and expanded emoticons.
Further, as shown in fig. 2, the specific steps of obtaining the expansion word include:
step S201: and obtaining words with the word vector similarity greater than the first preset similarity with the standard emotion words as candidate words, and obtaining a plurality of candidate words.
Specifically, in this step, for each standard emotion word, a word with a similarity greater than a first preset similarity with respect to the word vector is obtained as a candidate word, where the first preset similarity is a user-defined similarity, and the size of the first preset similarity can be flexibly set as required. For example, the word vector similarity of the standard emotion word "happy" and the non-standard emotion word "happy" is 0.64, and when the first preset similarity is smaller than 0.64, the "happy" is the candidate word.
Step S202: and obtaining the word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarity of each candidate word.
Specifically, in this step, after a plurality of candidate words are obtained, the word vector similarity between each candidate word is calculated mutually, so as to obtain a plurality of candidate similarities. For example, for the standard emotion word "happy", the similarity between the plurality of candidate words "happy", "24774" and "happy" is calculated by each other after the plurality of candidate words "happy", "24774" and "happy" are obtained.
Step S203: and acquiring the number of candidate similarities which are larger than the second preset similarity in the plurality of candidate similarities of each candidate word as the candidate number of each candidate word.
Specifically, in this embodiment, each candidate word corresponds to a plurality of candidate similarities, and the number of candidate similarities greater than the second preset similarity is obtained from the plurality of candidate similarities as the number of candidates of the candidate word. For example, the word vector similarity relationship between "happy", "24774", "happy" is shown in the following table,
open heart Joyful and happy 24774 treating Happy music Happy
Open heart 0.61 0.37 0.59 0.58
Joyful and happy 0.61 0.32 0.63 0.64
24774 treating 0.37 0.32 0.32 0.45
Happy music 0.59 0.63 0.32 0.58
Happy 0.58 0.64 0.45 0.58
Setting the second preset similarity to be 0.5; then the number of candidates for "24774," m "is 3 and the number of candidates for" open heart "is 1; the number of candidates for "happy" is 1, and the number of candidates for "happy" is 1.
Step S204: and taking the candidate words with the candidate number larger than a preset threshold value as expansion words.
Specifically, in this embodiment, the preset threshold is a threshold set by the user according to needs. For example, if the preset threshold is set to 2 in this embodiment, the "happy", "happy" and "happy" may be used as the expansion words, and the "24774" may not be used as the expansion words.
In addition, in this embodiment, when the expanded corpus includes expanded emoticons, specific steps for acquiring the expanded emoticons according to standard emotion words are shown in fig. 3, and include:
step S301: and acquiring a statement sample library, wherein the statement sample library comprises a plurality of statements.
Specifically, in this embodiment, the sentence sample library is an arbitrarily acquired network chat record, which includes a plurality of sentences. It should be understood that the above statement sample library is only a specific example in this embodiment, and is not limited to this, and in other embodiments of the present application, the statement sample library may be other multiple statements including emoticons, which are not listed here.
Step S302: and acquiring the emoticons which appear in the same sentence together with the standard emotion words as the extended emoticons.
Specifically, in the present embodiment, an emoticon that appears together with a standard emotion word such as "happy" or the like in the same sentence is acquired as an extended emoticon.
Step S103: and calculating the emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated manner.
Specifically, when the expanded corpus includes expanded terms, ESW for each expanded term i The method can be represented by a group of standard emotion words, namely, the word vector similarity of each standard emotion word and the expansion word is obtained and used as sampling similarity; acquiring a plurality of standard emotion words with sampling similarity larger than third preset similarity as sampling standard emotion words; i.e. ESW i ={<BSW i 1,S i 1>,<BSW i 2,S i 2>,…,<BSW i n,S i n>}, where BSW is ij Represents the j-th and ESW i Standard emotion words with similarity degree larger than third preset similarity degree S ij Representing the similarity, the emotion polarity calculation formula of each expansion word is as follows:
wherein P (BSW) ij ) Representing standard emotion words BSW ij The polarity of (1) is the sampling standard emotion polarity, 1 is positive, and-1 is negative, and the word ESW is expanded i Is based on the cumulative sum of the products of all standard emotion word polarities to which it corresponds and the similarity therebetween, denoted as P (ESW) i )。
Further, when the expanded corpus includes expanded emoticons, for the expanded emoticons SE i Obtaining standard emotion words co-occurring with the standard emotion words in the same sentence as sampling standard emotion words, e.g. SW j Representing j-th and emoticons SE i Co-occurrence of emotion words; then expand the emoticon SE j The emotion polarity calculation formula of (2) is:
wherein P (SW j ) Representing emotion words SW j Is positive 1 and negative-1.
Step S104: and acquiring emotion categories of the expanded corpus according to the standard emotion categories, and storing the emotion categories of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated mode.
Specifically, when the expanded corpus includes expanded terms, ESW for each expanded term i The method can be represented by a group of standard emotion words, namely, the word vector similarity of each standard emotion word and the expansion word is obtained and used as sampling similarity; acquiring a plurality of standard emotion words with sampling similarity larger than third preset similarity as sampling standard emotion words; i.e. ESW i ={<BSW i 1,S i 1>,<BSW i 2,S i 2>,…,<BSW i n,S i n>}, where BSW is ij Represents the j-th and ESW i Standard emotion words with similarity degree larger than third preset similarity degree S ij Representing the similarity, the emotion type calculation formula of each expansion word is as follows:
C=
{ happy, good, fear, anger, frightening, aversion, fun };
wherein C (BSW) ij ) Expressed in standard emotion words BSW ij The emotion category of (1), namely one of emotion category C= { happy, good, fear, anger, fright, aversion, grime }, extends the word ESW i The emotion category of (2) is the cumulative sum of emotion categories of all standard emotion words to which it corresponds, and the largest emotion category is taken as the emotion category of the expanded word and is denoted as C (ESW).
Further, when the expanded corpus includes the expanded emotion symbol, emotion significance and emotion correlation of the expanded emotion symbol are obtained according to standard emotion categories, the emotion significance is used for representing the intensity of the expanded emotion symbol for expressing different emotion categories, and the emotion correlation is used for representing the capability of the expanded emotion symbol for distinguishing different emotion categories.
The emotion significance is calculated as follows:
wherein ESS ij Indicating the salience of the ith emoticon toward the jth emotion category, count (SE i ,CSW j ) Representing the number of co-occurrences of the ith emoticon and the jth emotion type word, i.e. the first co-occurrence number, count (SE, CSW) j ) Representing the number of co-occurrences of all emoticons with the jth emotion category, i.e., the second co-occurrence number.
The emotion correlation is calculated as follows.
Wherein ESR is i Representing emotion category relevance for the ith emoticon, count (SE, CSW) represents all emoticons and all emoticons in the datasetThe frequency of co-occurrence of emotion type words, namely the third co-occurrence times; cocount (SE) i CSW) represents the co-occurrence number of the ith emoticon and all emotion category words, i.e., the fourth co-occurrence number.
The final emotion class tendencies are calculated as follows:
ESC ij =ESS ij *ESR i
the application calculates the tendencies of all the emotion categories corresponding to each emotion symbol, sorts the emotion categories from small to large, and then selects the emotion category with the highest tendency as the emotion category of the emotion symbol.
Compared with the prior art, the first embodiment of the application can automatically acquire the expanded corpus according to the standard emotion words of the terminal of the standard emotion corpus after acquiring the preset standard emotion corpus, and store the newly added expanded prediction into the preset standard emotion corpus, and in addition, the emotion polarity of the expanded corpus can be obtained by calculating according to the standard emotion polarity corresponding to the standard emotion words in the standard emotion corpus, and the emotion category of the expanded corpus is obtained by calculating according to the standard emotion category corresponding to the standard emotion words in the standard emotion corpus, thereby completing the autonomous updating and expansion of the standard emotion corpus and reducing the manpower consumption.
A second embodiment of the present application relates to an emotion corpus expanding device, as shown in fig. 4, including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the emotion corpus expansion method as described above.
Where the memory 402 and the processor 401 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 401 and the memory 402 together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 401 is transmitted over a wireless medium via an antenna, which further receives and transmits the data to the processor 401.
The processor 401 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 402 may be used to store data used by processor 401 in performing operations.
A third embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims (8)

1. An emotion corpus expansion method is characterized by comprising the following steps:
the method comprises the steps of obtaining a standard emotion corpus, wherein the standard emotion corpus comprises a plurality of standard emotion words, and standard emotion polarities and standard emotion categories which are stored corresponding to the standard emotion words;
acquiring an expanded corpus according to the standard emotion words, and adding and storing the expanded corpus into the standard emotion corpus;
calculating emotion polarity of the expanded corpus according to the standard emotion polarity, and storing the emotion polarity of the expanded corpus and the expanded corpus in a standard emotion corpus in an associated manner;
the emotion polarity of the expanded corpus is calculated according to the standard emotion polarity, and the emotion polarity calculation method specifically comprises the following steps: acquiring word vector similarity of each standard emotion word and the expansion word as sampling similarity; acquiring a plurality of standard emotion words with the sampling similarity being greater than a third preset similarity as sampling standard emotion words; acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity; calculating the product of the sampling similarity corresponding to each sampling standard emotion word and the sampling standard emotion polarity; accumulating the products; if the accumulated result is positive, the emotion polarity of the expansion word is 1; if the accumulated result is negative, the emotion polarity of the expansion word is-1; the third preset similarity is a preset similarity;
acquiring emotion categories of the expanded corpus according to the standard emotion categories, and storing the emotion categories of the expanded corpus and the expanded corpus in association with each other to the standard emotion corpus; the obtaining the emotion category of the expanded corpus according to the standard emotion category specifically includes: acquiring the standard emotion type corresponding to the sampling standard emotion word as a sampling standard emotion type; and taking the emotion type with the largest quantity in the sampling standard emotion types as the emotion type of the expansion word.
2. The method for expanding an emotion corpus according to claim 1, wherein the expanded corpus comprises the expanded words, and the acquiring the expanded corpus according to the standard emotion words specifically comprises:
obtaining words with word vector similarity greater than a first preset similarity with the standard emotion words as candidate words, and obtaining a plurality of candidate words;
obtaining word vector similarity between each candidate word and other candidate words as candidate similarity, and obtaining a plurality of candidate similarity of each candidate word;
acquiring the number of the candidate similarities which are larger than a second preset similarity in the plurality of the candidate similarities of each candidate word as the candidate number of each candidate word;
and taking the candidate words with the number greater than a preset threshold as the expansion words.
3. The method for expanding an emotion corpus according to claim 1, wherein the expanded corpus comprises expanded emoticons, and the acquiring the expanded corpus according to the standard emotion words specifically comprises:
acquiring a statement sample library, wherein the statement sample library comprises a plurality of statements;
and acquiring the emoticons which appear in the same sentence together with the standard emotion words as the extended emoticons.
4. The emotion corpus expansion method according to claim 3, wherein the calculating emotion polarity of the expanded corpus according to the standard emotion polarity specifically includes:
acquiring a standard emotion word which appears in the same sentence together with the extended expression symbol as a sampling standard emotion word;
acquiring the standard emotion polarity corresponding to the sampling standard emotion word as a sampling standard emotion polarity;
and calculating the sum of the sampling standard emotion polarities as the emotion polarity of the extended expression symbol.
5. The emotion corpus expansion method according to claim 3, wherein the obtaining emotion classification of the expanded corpus according to the standard emotion classification specifically includes:
according to the standard emotion type, emotion significance and emotion correlation of the extended emotion symbol are obtained, wherein the emotion significance is used for representing the strength of the extended emotion symbol for expressing different emotion types, and the emotion correlation is used for representing the capability of the extended emotion symbol for distinguishing different emotion types;
and acquiring the emotion category of the extended emotion symbol according to the product of the emotion significance and the emotion relativity.
6. The method for expanding an emotion corpus according to claim 5, wherein the obtaining emotion significance and emotion correlation of the expanded emoticons according to the standard emotion category specifically comprises:
acquiring the number of times that each extended expression symbol and the standard emotion word of each emotion category appear in the same sentence together, and acquiring the first co-occurrence times corresponding to each extended expression symbol and each emotion category;
acquiring the number of times that all the extended emoticons and the standard emotion words of each emotion category appear in the same sentence together, and acquiring second co-occurrence times corresponding to each emotion category;
acquiring the number of times that all the extended emoticons and the standard emotion words of all emotion categories appear in the same sentence together, and obtaining a third co-occurrence number;
acquiring the number of times that each extended expression symbol and the standard emotion words of all emotion categories appear in the same sentence together, and obtaining a fourth co-occurrence number;
and acquiring the emotion significance according to the first co-occurrence times and the second co-occurrence times, and acquiring the emotion correlation according to the third co-occurrence times and the fourth co-occurrence times.
7. An emotion corpus expanding device, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of expanding an emotion corpus as claimed in any of claims 1 to 6.
8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of expanding an emotion corpus according to any of claims 1 to 6.
CN202010248850.9A 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium Active CN111552815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010248850.9A CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010248850.9A CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111552815A CN111552815A (en) 2020-08-18
CN111552815B true CN111552815B (en) 2023-11-17

Family

ID=72005506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010248850.9A Active CN111552815B (en) 2020-04-01 2020-04-01 Emotion corpus expansion method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111552815B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342944B (en) * 2021-04-29 2023-04-07 腾讯科技(深圳)有限公司 Corpus generalization method, apparatus, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于HowNet和PMI的词语情感极性计算;王振宇等;《计算机工程》;20120805(第15期);全文 *

Also Published As

Publication number Publication date
CN111552815A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN110993081B (en) Doctor online recommendation method and system
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN106156204B (en) Text label extraction method and device
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN106886512B (en) Article classification method and device
CN112559684A (en) Keyword extraction and information retrieval method
US11429810B2 (en) Question answering method, terminal, and non-transitory computer readable storage medium
CN109783801B (en) Electronic device, multi-label classification method and storage medium
CN111159359A (en) Document retrieval method, document retrieval device and computer-readable storage medium
CN112579729B (en) Training method and device for document quality evaluation model, electronic equipment and medium
WO2023010427A1 (en) Systems and methods generating internet-of-things-specific knowledge graphs, and search systems and methods using such graphs
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN113673223A (en) Keyword extraction method and system based on semantic similarity
CN111552815B (en) Emotion corpus expansion method and device and computer readable storage medium
CN112270178A (en) Medical literature cluster theme determination method and device, electronic equipment and storage medium
CN109657052B (en) Method and device for extracting fine-grained knowledge elements contained in paper abstract
CN110969005B (en) Method and device for determining similarity between entity corpora
CN113239150B (en) Text matching method, system and equipment
CN106021225B (en) A kind of Chinese Maximal noun phrase recognition methods based on the simple noun phrase of Chinese
CN109300550B (en) Medical data relation mining method and device
CN115563242A (en) Automobile information screening method and device, electronic equipment and storage medium
CN108241650B (en) Training method and device for training classification standard
CN113886521A (en) Text relation automatic labeling method based on similar vocabulary
CN112115237B (en) Construction method and device of tobacco science and technology literature data recommendation model
CN115481255A (en) Multi-label text classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant