WO2020253042A1 - 情感智能判断方法、装置及计算机可读存储介质 - Google Patents

情感智能判断方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2020253042A1
WO2020253042A1 PCT/CN2019/117336 CN2019117336W WO2020253042A1 WO 2020253042 A1 WO2020253042 A1 WO 2020253042A1 CN 2019117336 W CN2019117336 W CN 2019117336W WO 2020253042 A1 WO2020253042 A1 WO 2020253042A1
Authority
WO
WIPO (PCT)
Prior art keywords
corpus
word
words
training
value
Prior art date
Application number
PCT/CN2019/117336
Other languages
English (en)
French (fr)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020253042A1 publication Critical patent/WO2020253042A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to an emotional intelligence judgment method, device, and computer-readable storage medium.
  • the present application provides an emotional intelligent judgment method, device, and computer-readable storage medium, the main purpose of which is to judge the emotional tendency of the text data input by the user when the user inputs text data.
  • an emotional intelligence judgment method provided by this application includes:
  • the word vector set is input into the convolutional neural network of the sentiment analysis model, the label set is input into the loss function of the sentiment analysis model, and the convolutional neural network receives the word vector set for training.
  • Training value inputting the training value into the loss function, the loss function is calculated based on the label set and the training value to obtain a loss value, and judging the loss value and the preset of the convolutional neural network The size of the training threshold until the loss value is less than the preset training threshold, the convolutional neural network exits training;
  • Receive text data input by the user input the text data into the sentiment analysis model to determine the sentiment tendency, and output the judgment result.
  • the present application also provides an emotional intelligence judgment device, which includes a memory and a processor.
  • the memory stores an emotional intelligence judgment program that can run on the processor. The following steps are implemented when the judging program is executed by the processor:
  • the word vector set is input into the convolutional neural network of the sentiment analysis model, the label set is input into the loss function of the sentiment analysis model, and the convolutional neural network receives the word vector set for training.
  • Training value inputting the training value into the loss function, the loss function is calculated based on the label set and the training value to obtain a loss value, and judging the loss value and the preset of the convolutional neural network The size of the training threshold until the loss value is less than the preset training threshold, the convolutional neural network exits training;
  • Receive text data input by the user input the text data into the sentiment analysis model to determine the sentiment tendency, and output the judgment result.
  • the present application also provides a computer-readable storage medium having an emotional intelligence judgment program stored on the computer-readable storage medium, and the emotional intelligence judgment program can be executed by one or more processors, In order to realize the steps of the above-mentioned emotional intelligence judgment method.
  • This application uses a convolutional neural network to perform sentiment judgments of text.
  • the convolutional neural network has many parameters and powerful representation capabilities. Therefore, it can be used to extract abstract features in text, and the extracted features are better than those manually formulated. It has stronger generalization performance, which is more suitable for the establishment of the model of this application, and improves the accuracy of emotion judgment. Therefore, the emotional intelligence judgment method, device, and computer-readable storage medium described in this application can realize an efficient emotional intelligence judgment function.
  • FIG. 1 is a schematic flowchart of an emotional intelligence judgment method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of the internal structure of an emotional intelligence judgment device provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of modules of an emotional intelligence judgment program in an emotional intelligence judgment device provided by an embodiment of the application.
  • This application provides an emotional intelligence judgment method.
  • FIG. 1 it is a schematic flowchart of an emotional intelligence judgment method provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the emotional intelligence judgment method includes:
  • S1 Receive a corpus and a tag set including a basic data set and a scene data set, and perform a preprocessing operation of including word segmentation and removing stop words on the corpus to obtain a standard corpus.
  • the basic data set includes a collection of Weibo comments, a collection of movie and TV reviews, and the like.
  • the Weibo comment set includes 40,000 Weibo comment data, including 15,000 Weibo comment data with happy emotional tendencies, 15,000 Weibo comment data with sad emotional tendencies, and no obvious emotional tendencies of happiness or sadness. 10,000 comments on Weibo.
  • the review collection of movies and TV shows is similar to the collection of Weibo comments, and will not be repeated.
  • the scene data set includes a stock comment collection, a government work report comment collection, and a company financial statement comment collection.
  • the scene data set has the same emotional division as the Weibo comment collection. It can be divided into data sets that include happy, sad, and emotional tendencies that do not show obvious happiness or sadness.
  • the tag set in the preferred embodiment of the present application includes three emotional tags of happy, sad, and normal, and the normal indicates that there is no obvious emotional tendency of happiness or sadness.
  • the word segmentation includes: establishing a probability word segmentation model P(S) and maximizing the probability word segmentation model P(S) according to the corpus, and using the maximized probability word segmentation
  • the model P(S) performs word segmentation operations on the corpus.
  • the probabilistic word segmentation model P(S) is:
  • W 1 , W 2 ,..., W m are the words included in the corpus, m is the number of the corpus, and p(W i
  • count(W i-1 ,W i ) represents the number of words W i-1 and W i appearing in the same text in the corpus at the same time
  • count(W i-1 ) represents the word W i-1
  • argmax represents the maximization operation.
  • the stop words are words that have no actual meaning in the text data and have no effect on the sentiment analysis of the text, but are words that appear frequently.
  • the stop words include commonly used pronouns, prepositions, and the like.
  • the method for removing stop words uses a stop word list filtering method, which is based on the stop word list that has been constructed and the words in the corpus are matched one by one. If the matching is successful, then The word is a stop word, and the word is deleted from the corpus.
  • len(W i , W j ) represents the length of the dependency path between words W i and W j
  • b is a hyperparameter
  • tfidf (W i), tfidf (W j) denotes the word W i, W j of term frequency - inverse document frequency index, d represents the Euclidean distance between vectors of words W i and W j words of;
  • the word vectorization operation uses the Word2Vec algorithm
  • the Word2Vec algorithm includes an input layer, a projection layer, and an output layer.
  • the input layer receives the keyword data set
  • the output layer outputs the word vector set
  • the projection layer ⁇ ( ⁇ ,j) is:
  • the Huffman coding uses different arrangements of 0 and 1 codes to represent the keyword data set according to data communication knowledge.
  • the convolutional neural network includes a convolutional layer, a pooling layer, and a fully connected layer.
  • the convolution layer receives the word vector set and performs a convolution operation on the word vector set to obtain a convolution set.
  • v' is the convolution set
  • v is the word vector set
  • k is the size of the convolution kernel
  • s is the stride of the convolution operation
  • p is the data zero-filling matrix
  • the convolution set is input to the pooling layer, and the pooling layer searches for the word vector with the largest value of each word vector in the convolution set and forms a pooling set.
  • the preferred embodiment of the present application inputs the pooled set to a fully connected layer, and the fully connected layer outputs the training value according to an activation function.
  • the activation function is:
  • x is the training value
  • ⁇ j is the label set
  • m is the number of the label set
  • the preset threshold is generally set to 0.01.
  • the user inputs a piece of text data about the sudden death of a pet dog to the sentiment analysis model, and the sentiment analysis model extracts keywords such as death and pets appearing in the text data, and determines the text based on the keywords It expresses a sad emotional tendency, and outputs the judgment result.
  • the invention also provides an emotional intelligent judgment device.
  • FIG. 2 it is a schematic diagram of the internal structure of an emotional intelligence judgment device provided by an embodiment of this application.
  • the emotional intelligence judgment device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer, or a server.
  • the emotional intelligence judgment device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 11 may be an internal storage unit of the emotional intelligence judgment device 1, for example, the hard disk of the emotional intelligence judgment device 1.
  • the memory 11 may also be an external storage device of the emotional intelligence judgment device 1, such as a plug-in hard disk equipped on the emotional intelligence judgment device 1, a smart media card (SMC), and a secure digital (Secure Digital). Digital, SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the emotional intelligence judgment device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the emotional intelligence determination device 1, such as the code of the emotional intelligence determination program 01, etc., but also to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, and is used to run the program code or processing stored in the memory 11 Data, such as execution of emotional intelligence judgment program 01, etc.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor or other data processing chip
  • the communication bus 13 is used to realize the connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the device 1 and other electronic devices.
  • the device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the emotional intelligent judgment device 1 and to display a visualized user interface.
  • Figure 2 only shows the emotional intelligence judgment device 1 with components 11-14 and the emotional intelligence judgment program 01.
  • the structure shown in Figure 1 does not constitute a limitation on the emotional intelligence judgment device 1 It may include fewer or more components than shown, or a combination of some components, or a different component arrangement.
  • the emotional intelligence judgment program 01 is stored in the memory 11; when the processor 12 executes the emotional intelligence judgment program 01 stored in the memory 11, the following steps are implemented:
  • Step 1 Receive a corpus and tag set including a basic data set and a scene data set, and perform preprocessing operations including word segmentation and de-stop words on the corpus to obtain a standard corpus.
  • the basic data set includes a collection of Weibo comments, a collection of movie and TV reviews, and the like.
  • the Weibo comment set includes 40,000 Weibo comment data, including 15,000 Weibo comment data with happy emotional tendencies, 15,000 Weibo comment data with sad emotional tendencies, and no obvious emotional tendencies of happiness or sadness. 10,000 comments on Weibo.
  • the review collection of movies and TV shows is similar to the collection of Weibo comments, and will not be repeated here.
  • the scene data set includes a stock comment collection, a government work report comment collection, and a company financial statement comment collection.
  • the scene data set has the same emotional division as the Weibo comment collection. Divide the data set that includes happy, sad, and emotional tendency that does not show obvious happiness or sadness.
  • the tag set in the preferred embodiment of the present application includes three emotional tags of happy, sad, and normal, and the normal indicates that there is no obvious emotional tendency of happiness or sadness.
  • the word segmentation includes establishing a probability word segmentation model P(S) and maximizing the probability word segmentation model P(S) according to the corpus, and using the maximized probability word segmentation model P(S), perform word segmentation operation on the corpus.
  • the probabilistic word segmentation model P(S) is:
  • W 1 , W 2 ,..., W m are the words included in the corpus, m is the number of the corpus, and p(W i
  • count(W i-1 ,W i ) represents the number of words W i-1 and W i appearing in the same text in the corpus at the same time
  • count(W i-1 ) represents the word W i-1
  • argmax represents the maximization operation.
  • the stop words are words that have no actual meaning in the text data and have no effect on the sentiment analysis of the text, but words that appear frequently.
  • the stop words include commonly used pronouns, Prepositions etc.
  • the method for removing stop words is a stop word list filtering method, which is based on the stop word list that has been constructed and the words in the corpus are matched one by one. If the matching is successful, the The word is a stop word, and the word is deleted from the corpus.
  • Step 2 Perform keyword extraction on the standard corpus based on a keyword extraction algorithm to obtain a keyword data set, and perform a word vectorization operation on the keyword data set to obtain a word vector set.
  • the keyword extraction algorithm described in the preferred embodiment of the present application includes: calculating the dependency correlation degree Dep(W i , W j ) between any two words W i , W j in the standard corpus:
  • len(W i , W j ) represents the length of the dependency path between words W i and W j
  • b is a hyperparameter
  • tfidf (W i), tfidf (W j) denotes the word W i, W j of term frequency - inverse document frequency index, d represents the Euclidean distance between vectors of words W i and W j words of;
  • the word vectorization operation in the preferred embodiment of this application adopts the Word2Vec algorithm.
  • the Word2Vec algorithm includes an input layer, a projection layer, and an output layer.
  • the input layer receives the keyword data set, and the output layer outputs the Word vector set, the projection layer ⁇ ( ⁇ ,j) is:
  • the Huffman coding uses different arrangements of 0 and 1 codes to represent the keyword data set according to data communication knowledge.
  • Step 3 Input the word vector set into the convolutional neural network of the sentiment analysis model, input the label set into the loss function of the sentiment analysis model, and the convolutional neural network receives the word vector set Training is performed to obtain a training value, the training value is input into the loss function, and the loss function is calculated based on the label set and the training value to obtain a loss value, and the loss value is determined to be the same as the convolutional neural network The size of the preset training threshold of, until the loss value is less than the preset training threshold, the convolutional neural network exits training.
  • the convolutional neural network includes a convolutional layer, a pooling layer, and a fully connected layer.
  • the convolution layer receives the word vector set and performs a convolution operation on the word vector set to obtain a convolution set.
  • v' is the convolution set
  • v is the word vector set
  • k is the size of the convolution kernel
  • s is the stride of the convolution operation
  • p is the data zero-filling matrix
  • the convolution set is input to the pooling layer, and the pooling layer searches for the word vector with the largest value of each word vector in the convolution set and forms a pooling set.
  • the preferred embodiment of the present application inputs the pooled set to a fully connected layer, and the fully connected layer outputs the training value according to an activation function.
  • the activation function is:
  • x is the training value
  • ⁇ j is the label set
  • m is the number of the label set
  • the preset threshold is generally set to 0.01.
  • Step 4 Receive text data input by the user, input the text data into the sentiment analysis model to judge the sentiment tendency, and output the judgment result.
  • the user inputs a piece of text data about the sudden death of his pet dog to the sentiment analysis model, and the sentiment analysis model extracts keywords such as death and pets appearing in the text data, and judges according to the keywords It shows that the text expresses sad emotional tendencies, and outputs the judgment result.
  • the emotional intelligence judgment program can also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (in this embodiment, The processor 12) is executed to complete the application.
  • the module referred to in the application refers to a series of computer program instruction segments that can complete specific functions, and is used to describe the execution process of the emotional intelligence judgment program in the emotional intelligence judgment device.
  • the emotional intelligence determination program can be divided into a data receiving module 10 and a data
  • the processing module 20, the model training module 30, and the emotion judgment output module 40 are exemplary:
  • the data receiving module 10 is configured to receive a corpus and a tag set including a basic data set and a scene data set, and perform preprocessing operations including word segmentation and de-stop words on the corpus to obtain a standard corpus.
  • the data processing module 20 is configured to: perform keyword extraction on the standard corpus based on a keyword extraction algorithm to obtain a keyword data set, and perform a word vectorization operation on the keyword data set to obtain a word vector set.
  • the model training module 30 is configured to: input the word vector set into the convolutional neural network of the sentiment analysis model, input the label set into the loss function of the sentiment analysis model, and the convolutional neural network
  • the word vector set is received for training to obtain a training value
  • the training value is input into the loss function
  • the loss function is calculated based on the label set and the training value to obtain a loss value
  • the loss value is determined to be
  • the sentiment judgment output module 40 is configured to receive text data input by a user, input the text data into the sentiment analysis model to judge the sentiment tendency, and output the judgment result.
  • the embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium stores an emotional intelligence judgment program, and the emotional intelligence judgment program can be executed by one or more processors to achieve the following operating:
  • a corpus and a tag set including a basic data set and a scene data set are received, and the corpus is subjected to preprocessing operations including word segmentation and stop words removal to obtain a standard corpus.
  • a keyword data set is obtained after keyword extraction is performed on the standard corpus based on a keyword extraction algorithm, and a word vectorization operation is performed on the keyword data set to obtain a word vector set.
  • the word vector set is input into the convolutional neural network of the sentiment analysis model
  • the label set is input into the loss function of the sentiment analysis model
  • the convolutional neural network receives the word vector set for training.
  • Training value input the training value into the loss function
  • the loss function calculates the loss value based on the label set and the training value, and judges the loss value and the preset of the convolutional neural network
  • Receive text data input by the user input the text data into the sentiment analysis model to determine the sentiment tendency, and output the judgment result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种情感智能判断方法,涉及人工智能领域,包括:接收语料集和标签集,将所述语料集进行预处理操作得到标准语料集;对所述标准语料集进行关键字抽取和词向量化操作得到词向量集;将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与预设阈值的大小,直至所述卷积神经网络退出训练;对用户输入的文本数据产生情感判断结果。还提出一种情感智能判断装置以及一种计算机可读存储介质。可以实现精准的情感智能判断功能。

Description

情感智能判断方法、装置及计算机可读存储介质
本申请基于巴黎公约申明享有2019年06月18日递交的申请号为CN 201910530889.7、名称为“情感智能判断方法、装置及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种情感智能判断方法、装置及计算机可读存储介质。
背景技术
随着移动互联网的迅猛发展,人们通过移动参与各类网络活动时,产生了大量具有情感倾向性的文本。如何快速从这些文本中挖掘出其情感倾向性,为政府、企业以及个人的决策提供有效地帮助,已经成为自然语言处理领域的热点问题。然而现有的情感判断大多基于人工制定的规则,其分类颗粒度较大、识别较为困难、没有判断句子的语境,即没有利用句子的上下文来判断句子的真实含义,同时,情感判断的准确率也往往在达到一定水平之后裹足不前。
发明内容
本申请提供一种情感智能判断方法、装置及计算机可读存储介质,其主要目的是当用户输入文本数据时,判断所述用户输入的文本数据的情感倾向。
为实现上述目的,本申请提供的一种情感智能判断方法,包括:
接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集;
基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集;
将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量 集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练;
接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
此外,为实现上述目的,本申请还提供一种情感智能判断装置,该装置包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的情感智能判断程序,所述情感智能判断程序被所述处理器执行时实现如下步骤:
接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集;
基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集;
将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练;
接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有情感智能判断程序,所述情感智能判断程序可被一个或者多个处理器执行,以实现如上所述的情感智能判断方法的步骤。
本申请使用卷积神经网络进行文本的情感判断,所述卷积神经网络的参数众多,有强大的表征能力,因此可以用来提取文本中的抽象特征,且提取出来的特征比人工制定的特征具有更强的泛化性能,从而更适应本申请模型的建立,提高情感判断的准确性。因此本申请所述情感智能判断方法、装置及计算机可读存储介质可以实现高效的情感智能判断功能。
附图说明
图1为本申请一实施例提供的情感智能判断方法的流程示意图;
图2为本申请一实施例提供的情感智能判断装置的内部结构示意图;
图3为本申请一实施例提供的情感智能判断装置中情感智能判断程序的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种情感智能判断方法。参照图1所示,为本申请一实施例提供的情感智能判断方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,情感智能判断方法包括:
S1、接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集。
本申请较佳实施例中,所述基础数据集包括微博评论集、影电观后感集等。所述微博评论集包括40000条微博评论数据,其中包括高兴的情感倾向的微博评论数据15000条、难过的情感倾向的微博评论数据15000条、没有表现出明显高兴或难过的情感倾向的微博评论数据10000条。所述影电观后感集和所述微博评论集类似,不再赘述。
本申请较佳实施例中,所述场景数据集包括股票评论集、政府工作报告评论集、公司财务报表评论集,所述场景数据集与所述所述微博评论集的情感划分相同,都可以被划分成包括高兴、难过和没有表现出明显高兴或难过的情感倾向数据集。
本申请较佳实施例所述标签集包括高兴、难过、正常三种情感标签,所述正常表示没有表现出明显高兴或难过的情感倾向。
本申请较佳实施例中,所述分词包括:根据所述语料集建立概率分词模型P(S)和最大化所述概率分词模型P(S),并利用所述最大化的所述概率分词模 型P(S),对所述语料集执行分词操作。
优选地,所述概率分词模型P(S)为:
Figure PCTCN2019117336-appb-000001
其中,W 1,W 2,…,W m为所述语料集包括的词,m为所述语料集的数量,p(W i|W i-1)表示在词W i-1出现的情况下词W i出现的概率;
所述最大化所述概率分词模型P(S):
Figure PCTCN2019117336-appb-000002
其中,count(W i-1,W i)表示词W i-1和词W i同时出现在所述语料集内同一篇文本的文本数量,count(W i-1)表示词W i-1出现在所述语料集内的文本数量,argmax表示最大化操作。
所述停用词是文本数据中没有什么实际意义的词,且对文本的情感分析没有什么影响,但出现频率高的词,所述停用词包括常用的代词、介词等。
本申请较佳实施例中,所述去停用词的方法采用停用词表过滤法,基于已构建好的停用词表和所述语料集的词进行一一匹配,若匹配成功,则该词为停用词,且将所述该词从所述语料集中删除。
S2、基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集。
本申请较佳实施例所述关键字抽取算法包括:
计算所述标准语料集中任意两词W i,W j之间的依存关联度Dep(W i,W j):
Figure PCTCN2019117336-appb-000003
其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数;
计算所述标准语料集中任意两词W i,W j之间的引力值f grav(W i,W j):
Figure PCTCN2019117336-appb-000004
其中,tfidf(W i)、tfidf(W j)表示词W i,W j的词频-逆文本频率指数,d表示词W i和W j的词向量之间的欧式距离;
根据所述依存关联度Dep(W i,W j)和所述引力值f grav(W i,W j)计算所述标准语料集中任意两词W i,W j之间的权重系数weight(W i,W j):
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j)
对所述权重系数的大小进行排序,选择权重系数weight(W i,W j)最大的词,完成所述关键字抽取,得到关键字数据集。
本申请较佳实施例中,所述词向量化操作采用Word2Vec算法,所述Word2Vec算法包括输入层、投影层和输出层。其中,所述输入层接收所述关键字数据集,所述输出层输出得到所述词向量集,所述投影层ζ(ω,j)为:
Figure PCTCN2019117336-appb-000005
其中,
Figure PCTCN2019117336-appb-000006
表示在路径ω内,第j个结点对应的霍夫曼编码,θ为所述Word2Vec模型的迭代因子,σ表示sigmoid函数,X ω为所述关键字数据集。
所述霍夫曼编码是根据数据通信知识使用0,1码的不同排列来表示所述关键字数据集。
S3、将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练。
在本申请较佳实施例中,所述卷积神经网络包括卷积层、池化层、全连接层。所述卷积层接收所述词向量集并对所述词向量集进行卷积操作得到卷积集。
本申请较佳实施例所述卷积操作为:
Figure PCTCN2019117336-appb-000007
其中v′为所述卷积集,v为所述词向量集,k为卷积核的大小,s为所述卷积操作的步幅,p为数据补零矩阵。
本申请较佳实施例将所述卷积集输入至所述池化层,所述池化层寻找所述卷积集中各词向量数值最大的词向量并组成池化集。
本申请较佳实施例将所述池化集输入至全连接层,所述全连接层根据激活函数输出所述训练值。所述激活函数为:
Figure PCTCN2019117336-appb-000008
其中y为所述训练值,e为无限不循环小数。
本申请较佳实施例所述损失值E为:
Figure PCTCN2019117336-appb-000009
其中,x为所述训练值,μ j为所述标签集,m为所述标签集的数量,所述预设阈值一般设定为0.01。
S4、接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
例如,用户输入一条宠物狗突然离世的文本数据至所述情感分析模型,所述情感分析模型提取所述文本数据中出现的离世、爱宠等关键字,根据所述关键字判断出所述文本表现的是难过的情感倾向,并输出所述判断结果。
发明还提供一种情感智能判断装置。参照图2所示,为本申请一实施例提供的情感智能判断装置的内部结构示意图。
在本实施例中,所述情感智能判断装置1可以是PC(Personal Computer,个人电脑),或者是智能手机、平板电脑、便携计算机等终端设备或者服务器等。该情感智能判断装置1至少包括存储器11、处理器12,通信总线13,以及网络接口14。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是情感智能判断装置1的内部存储单元,例如该情感智能判断装置1的硬盘。存储器11在另一些实施例中也可以是情感智能判断装置1的外部存储设备,例如情感智能判断装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括情感智能判断装置1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于情感智能判断装置1的应用软件及各类数据,例如情感智能判断程序01的代码等,还可以用于暂时地存储已经输出或 者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行情感智能判断程序01等。
通信总线13用于实现这些组件之间的连接通信。
网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在情感智能判断装置1中处理的信息以及用于显示可视化的用户界面。
图2仅示出了具有组件11-14以及情感智能判断程序01的情感智能判断装置1,本领域技术人员可以理解的是,图1示出的结构并不构成对情感智能判断装置1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
在图2所示的装置1实施例中,存储器11中存储有情感智能判断程序01;处理器12执行存储器11中存储的情感智能判断程序01时实现如下步骤:
步骤一、接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集。
本申请较佳实施例中,所述基础数据集包括微博评论集、影电观后感集等。所述微博评论集包括40000条微博评论数据,其中包括高兴的情感倾向的微博评论数据15000条、难过的情感倾向的微博评论数据15000条、没有表现出明显高兴或难过的情感倾向的微博评论数据10000条。所述影电观后感集和所述微博评论集类似,不再赘述。
本申请较佳实施例中,所述场景数据集包括股票评论集、政府工作报告评论集、公司财务报表评论集,所述场景数据集与所述所述微博评论集的情感划分相同,都划分出包括高兴、难过和没有表现出明显高兴或难过的情感 倾向数据集。
本申请较佳实施例所述标签集包括高兴、难过、正常三种情感标签,所述正常表示没有表现出明显高兴或难过的情感倾向。
本申请较佳实施例中,所述分词包括根据所述语料集建立概率分词模型P(S)和最大化所述概率分词模型P(S),并利用所述最大化的所述概率分词模型P(S),对所述语料集执行分词操作。
优选地,所述概率分词模型P(S)为:
Figure PCTCN2019117336-appb-000010
其中,W 1,W 2,…,W m为所述语料集包括的词,m为所述语料集的数量,p(W i|W i-1)表示在词W i-1出现的情况下词W i出现的概率;
所述最大化所述概率分词模型P(S):
Figure PCTCN2019117336-appb-000011
其中,count(W i-1,W i)表示词W i-1和词W i同时出现在所述语料集内同一篇文本的文本数量,count(W i-1)表示词W i-1出现在所述语料集内的文本数量,argmax表示最大化操作。
本申请较佳实施例,所述停用词是文本数据中没有什么实际意义的词,且对文本的情感分析没有什么影响,但出现频率高的词,所述停用词包括常用的代词、介词等。
本申请较佳实施例,所述去停用词的方法为停用词表过滤法,基于已构建好的停用词表和所述语料集的词进行一一匹配,若匹配成功,则该词为停用词,且将所述该词从所述语料集中删除。
步骤二、基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集。
本申请较佳实施例所述关键字抽取算法包括:计算所述标准语料集中任意两词W i,W j之间的依存关联度Dep(W i,W j):
Figure PCTCN2019117336-appb-000012
其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数;
计算所述标准语料集中任意两词W i,W j之间的引力值f grav(W i,W j):
Figure PCTCN2019117336-appb-000013
其中,tfidf(W i)、tfidf(W j)表示词W i,W j的词频-逆文本频率指数,d表示词W i和W j的词向量之间的欧式距离;
根据所述依存关联度Dep(W i,W j)和所述引力值f grav(W i,W j)计算所述标准语料集中任意两词W i,W j之间的权重系数weight(W i,W j):
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j)
对所述权重系数的大小进行排序,选择权重系数weight(W i,W j)最大的词。
本申请较佳实施例所述词向量化操作采用Word2Vec算法,所述Word2Vec算法包括输入层、投影层和输出层,所述输入层接收所述关键字数据集,所述输出层输出得到所述词向量集,所述投影层ζ(ω,j)为:
Figure PCTCN2019117336-appb-000014
其中,
Figure PCTCN2019117336-appb-000015
表示在路径ω内,第j个结点对应的霍夫曼编码,θ为所述Word2Vec模型的迭代因子,σ表示sigmoid函数,X ω为所述关键字数据集。
所述霍夫曼编码是根据数据通信知识使用0,1码的不同排列来表示所述关键字数据集。
步骤三、将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练。
在本申请较佳实施例中,所述卷积神经网络包括卷积层、池化层、全连接层。所述卷积层接收所述词向量集并对所述词向量集进行卷积操作得到卷积集。
本申请较佳实施例所述卷积操作为:
Figure PCTCN2019117336-appb-000016
其中v′为所述卷积集,v为所述词向量集,k为卷积核的大小,s为所述卷积操作的步幅,p为数据补零矩阵。
本申请较佳实施例将所述卷积集输入至所述池化层,所述池化层寻找所述卷积集中各词向量数值最大的词向量并组成池化集。
本申请较佳实施例将所述池化集输入至全连接层,所述全连接层根据激活函数输出所述训练值。所述激活函数为:
Figure PCTCN2019117336-appb-000017
其中y为所述训练值,e为无限不循环小数。
本申请较佳实施例所述损失值E为:
Figure PCTCN2019117336-appb-000018
其中,x为所述训练值,μ j为所述标签集,m为所述标签集的数量,所述预设阈值一般设定为0.01。
步骤四、接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
例如,用户输入一条关于自己的爱宠狗突然离世的文本数据至所述情感分析模型,所述情感分析模型提取所述文本数据中出现的离世、爱宠等关键字,根据所述关键字判断出所述文本表现的是难过的情感倾向,并输出所述判断结果。
可选地,在其他实施例中,情感智能判断程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述情感智能判断程序在情感智能判断装置中的执行过程。
例如,参照图3所示,为本申请情感智能判断装置一实施例中的情感智能判断程序的程序模块示意图,该实施例中,所述情感智能判断程序可以被分割为数据接收模块10、数据处理模块20、模型训练模块30、情感判断输出模块40示例性地:
所述数据接收模块10用于:接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集。
所述数据处理模块20用于:基于关键字抽取算法对所述标准语料集进行 关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集。
所述模型训练模块30用于:将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练。
所述情感判断输出模块40用于:接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
上述数据接收模块10、数据处理模块20、模型训练模块30、情感判断输出模块40等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有情感智能判断程序,所述情感智能判断程序可被一个或多个处理器执行,以实现如下操作:
接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集。
基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集。
将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练。
接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种情感智能判断方法,其特征在于,所述方法包括:
    接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集;
    基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集;
    将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练;
    接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
  2. 如权利要求1所述的情感智能判断方法,其特征在于:
    所述基础数据集包括微博评论集、影电观后感集;
    所述场景数据集包括股票评论集、政府工作报告评论集、公司财务报表评论集;
    所述标签集包括高兴、难过、正常三种情感标签。
  3. 如权利要求1所述的情感智能判断方法,其特征在于,所述分词包括:
    根据所述语料集建立概率分词模型P(S),并最大化所述概率分词模型P(S),并利用最大化的所述概率分词模型P(S),对所述语料集执行分词操作。
  4. 如权利要求3所述的情感智能判断方法,其特征在于,所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100001
    其中,W 1,W 2,…,W m为所述语料集包括的词,m为所述语料集包括的词的数量,p(W i|W i-1)表示在词W i-1出现的情况下词W i出现的概率。
  5. 如权利要求4所述的情感智能判断方法,其特征在于,所述最大化所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100002
    其中,count(W i-1,W i)表示词W i-1和词W i同时出现在所述语料集内同一篇文本的文本数量,count(W i-1)表示词W i-1出现在所述语料集内的文本数量,argmax表示最大化操作。
  6. 如权利要求2所述的情感智能判断方法,其特征在于,所述基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,包括:
    计算所述标准语料集中任意两词W i,W j之间的依存关联度Dep(W i,W j):
    Figure PCTCN2019117336-appb-100003
    其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数;
    计算所述标准语料集中任意两词W i,W j之间的引力值f grav(W i,W j):
    Figure PCTCN2019117336-appb-100004
    其中,tfidf(W i)、tfidf(W j)表示词W i,W j的词频-逆文本频率指数,d表示词W i和W j的词向量之间的欧式距离;
    根据所述依存关联度Dep(W i,W j)和所述引力值f grav(W i,W j)计算所述标准语料集中任意两词W i,W j之间的权重系数weight(W i,W j):
    weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j)
    对所述权重系数的大小进行排序,选择权重系数weight(W i,W j)最大的词,完成所述关键字抽取,得到关键字数据集。
  7. 如权利要求6中的情感智能判断方法,其特征在于,所述接收所述词向量集进行训练包括对所述词向量集进行卷积操作和激活操作;
    所述卷积操作为:
    Figure PCTCN2019117336-appb-100005
    其中v′为所述卷积操作输出的卷积集,v为所述词向量集,k为卷积核的大小,s为所述卷积操作的步幅,p为数据补零矩阵;
    所述激活函数为:
    Figure PCTCN2019117336-appb-100006
    其中y为所述训练值,e为无限不循环小数。
  8. 一种情感智能判断装置,其特征在于,所述装置包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的情感智能判断程序,所述情感 智能判断程序被所述处理器执行时实现如下步骤:
    接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集;
    基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集;
    将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练;
    接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
  9. 如权利要求8所述的情感智能判断装置,其特征在于,
    所述基础数据集包括微博评论集、影电观后感集;
    所述场景数据集包括股票评论集、政府工作报告评论集、公司财务报表评论集;
    所述标签集包括高兴、难过、正常三种情感标签。
  10. 如权利要求8所述的情感智能判断装置,其特征在于,所述分词包括根据所述语料集建立概率分词模型P(S),并最大化所述概率分词模型P(S),并利用最大化的所述概率分词模型P(S),对所述语料集执行分词操作。
  11. 如权利要求10所述的情感智能判断装置,所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100007
    其中,W 1,W 2,…,W m为所述语料集包括的词,m为所述语料集包括的词的数量,p(W i|W i-1)表示在词W i-1出现的情况下词W i出现的概率。
  12. 如权利要求11所述的情感智能判断装置,所述最大化所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100008
    其中,count(W i-1,W i)表示词W i-1和词W i同时出现在所述语料集内同一篇文本的文本数量,count(W i-1)表示词W i-1出现在所述语料集内的文本数量, argmax表示最大化操作。
  13. 如权利要求12所述的情感智能判断装置,其特征在于,所述基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,包括:
    计算所述标准语料集中任意两词W i,W j之间的依存关联度Dep(W i,W j):
    Figure PCTCN2019117336-appb-100009
    其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数;
    计算所述标准语料集中任意两词W i,W j之间的引力值f grav(W i,W j):
    Figure PCTCN2019117336-appb-100010
    其中,tfidf(W i)、tfidf(W j)表示词W i,W j的词频-逆文本频率指数,d表示词W i和W j的词向量之间的欧式距离;
    根据所述依存关联度Dep(W i,W j)和所述引力值f grav(W i,W j)计算所述标准语料集中任意两词W i,W j之间的权重系数weight(W i,W j):
    weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j)
    对所述权重系数的大小进行排序,选择权重系数weight(W i,W j)最大的词,完成所述关键字抽取,得到关键字数据集。
  14. 如权利要求13中的情感智能判断装置,其特征在于,所述接收所述词向量集进行训练包括对所述词向量集进行卷积操作和激活操作;
    所述卷积操作为:
    Figure PCTCN2019117336-appb-100011
    其中v′为所述卷积操作输出的卷积集,v为所述词向量集,k为卷积核的大小,s为所述卷积操作的步幅,p为数据补零矩阵;
    所述激活函数为:
    Figure PCTCN2019117336-appb-100012
    其中y为所述训练值,e为无限不循环小数。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有情感智能判断程序,所述情感智能判断程序可被一个或者多个处理器执行,以实现如下步骤:
    接收包括基础数据集和场景数据集的语料集和标签集,将所述语料集进行包括分词、去停用词的预处理操作得到标准语料集;
    基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,对所述关键字数据集进行词向量化操作得到词向量集;
    将所述词向量集输入至情感分析模型的卷积神经网络中,将所述标签集输入至所述情感分析模型的损失函数中,所述卷积神经网络接收所述词向量集进行训练得到训练值,将所述训练值输入至所述损失函数中,所述损失函数基于所述标签集和所述训练值计算得到损失值,判断所述损失值与所述卷积神经网络的预设训练阈值的大小,直至所述损失值小于所述预设训练阈值时,所述卷积神经网络退出训练;
    接收用户输入的文本数据,将所述文本数据输入至所述情感分析模型中判断情感倾向,并输出判断结果。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,
    所述基础数据集包括微博评论集、影电观后感集;
    所述场景数据集包括股票评论集、政府工作报告评论集、公司财务报表评论集;
    所述标签集包括高兴、难过、正常三种情感标签。
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述分词包括根据所述语料集建立概率分词模型P(S),并最大化所述概率分词模型P(S),并利用最大化的所述概率分词模型P(S),对所述语料集执行分词操作。
  18. 如权利要求17所述的计算机可读存储介质,所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100013
    其中,W 1,W 2,…,W m为所述语料集包括的词,m为所述语料集包括的词的数量,p(W i|W i-1)表示在词W i-1出现的情况下词W i出现的概率;
    所述最大化所述概率分词模型P(S)为:
    Figure PCTCN2019117336-appb-100014
    其中,count(W i-1,W i)表示词W i-1和词W i同时出现在所述语料集内同一篇文本的文本数量,count(W i-1)表示词W i-1出现在所述语料集内的文本数量,argmax表示最大化操作。
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,所述基于关键字抽取算法对所述标准语料集进行关键字抽取后得到关键字数据集,包括:
    计算所述标准语料集中任意两词W i,W j之间的依存关联度Dep(W i,W j):
    Figure PCTCN2019117336-appb-100015
    其中,len(W i,W j)表示词语W i和W j之间的依存路径长度,b是超参数;
    计算所述标准语料集中任意两词W i,W j之间的引力值f grav(W i,W j):
    Figure PCTCN2019117336-appb-100016
    其中,tfidf(W i)、tfidf(W j)表示词W i,W j的词频-逆文本频率指数,d表示词W i和W j的词向量之间的欧式距离;
    根据所述依存关联度Dep(W i,W j)和所述引力值f grav(W i,W j)计算所述标准语料集中任意两词W i,W j之间的权重系数weight(W i,W j):
    weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j)
    对所述权重系数的大小进行排序,选择权重系数weight(W i,W j)最大的词,完成所述关键字抽取,得到关键字数据集。
  20. 如权利要求19中的计算机可读存储介质,其特征在于,所述接收所述词向量集进行训练包括对所述词向量集进行卷积操作和激活操作;
    所述卷积操作为:
    Figure PCTCN2019117336-appb-100017
    其中v′为所述卷积操作输出的卷积集,v为所述词向量集,k为卷积核的大小,s为所述卷积操作的步幅,p为数据补零矩阵;
    所述激活函数为:
    Figure PCTCN2019117336-appb-100018
    其中y为所述训练值,e为无限不循环小数。
PCT/CN2019/117336 2019-06-18 2019-11-12 情感智能判断方法、装置及计算机可读存储介质 WO2020253042A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910530889.7A CN110442857B (zh) 2019-06-18 2019-06-18 情感智能判断方法、装置及计算机可读存储介质
CN201910530889.7 2019-06-18

Publications (1)

Publication Number Publication Date
WO2020253042A1 true WO2020253042A1 (zh) 2020-12-24

Family

ID=68429235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117336 WO2020253042A1 (zh) 2019-06-18 2019-11-12 情感智能判断方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110442857B (zh)
WO (1) WO2020253042A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255328A (zh) * 2021-06-28 2021-08-13 北京京东方技术开发有限公司 语言模型的训练方法及应用方法
CN113434631A (zh) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 基于事件的情感分析方法、装置、计算机设备及存储介质
CN113591471A (zh) * 2021-08-20 2021-11-02 上海大参林医疗健康科技有限公司 一种基于字和词的语言特征提取装置及方法
CN113722483A (zh) * 2021-08-31 2021-11-30 平安银行股份有限公司 话题分类方法、装置、设备及存储介质
CN114580427A (zh) * 2021-12-29 2022-06-03 北京邮电大学 自媒体用户选择方法及相关设备
CN115659995A (zh) * 2022-12-30 2023-01-31 荣耀终端有限公司 一种文本情感分析方法和装置
CN116402048A (zh) * 2023-06-02 2023-07-07 布比(北京)网络技术有限公司 一种可解释的区块链应用趋势分析方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860841B (zh) * 2021-01-21 2023-10-24 平安科技(深圳)有限公司 一种文本情感分析方法、装置、设备及存储介质
CN114386436B (zh) * 2022-01-21 2023-07-18 平安科技(深圳)有限公司 文本数据的分析方法、模型训练方法、装置及计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6820044B2 (en) * 2001-10-09 2004-11-16 University Of Maryland Method and apparatus for a common-cause failure module for probabilistic risk assessment tools
CN108717406A (zh) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 文本情绪分析方法、装置及存储介质
CN108875049A (zh) * 2018-06-27 2018-11-23 中国建设银行股份有限公司 文本聚类方法及装置
CN109766437A (zh) * 2018-12-07 2019-05-17 中科恒运股份有限公司 一种文本聚类方法、文本聚类装置及终端设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544242B (zh) * 2013-09-29 2017-02-15 广东工业大学 面向微博的情感实体搜索系统
CN108170667B (zh) * 2017-11-30 2020-06-23 阿里巴巴集团控股有限公司 词向量处理方法、装置以及设备
CN108345587B (zh) * 2018-02-14 2020-04-24 广州大学 一种评论的真实性检测方法与系统
CN108647219A (zh) * 2018-03-15 2018-10-12 中山大学 一种结合情感词典的卷积神经网络文本情感分析方法
CN108984523A (zh) * 2018-06-29 2018-12-11 重庆邮电大学 一种基于深度学习模型的商品评论情感分析方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6820044B2 (en) * 2001-10-09 2004-11-16 University Of Maryland Method and apparatus for a common-cause failure module for probabilistic risk assessment tools
CN108717406A (zh) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 文本情绪分析方法、装置及存储介质
CN108875049A (zh) * 2018-06-27 2018-11-23 中国建设银行股份有限公司 文本聚类方法及装置
CN109766437A (zh) * 2018-12-07 2019-05-17 中科恒运股份有限公司 一种文本聚类方法、文本聚类装置及终端设备

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434631A (zh) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 基于事件的情感分析方法、装置、计算机设备及存储介质
CN113434631B (zh) * 2021-06-25 2023-10-13 平安科技(深圳)有限公司 基于事件的情感分析方法、装置、计算机设备及存储介质
CN113255328A (zh) * 2021-06-28 2021-08-13 北京京东方技术开发有限公司 语言模型的训练方法及应用方法
CN113255328B (zh) * 2021-06-28 2024-02-02 北京京东方技术开发有限公司 语言模型的训练方法及应用方法
CN113591471A (zh) * 2021-08-20 2021-11-02 上海大参林医疗健康科技有限公司 一种基于字和词的语言特征提取装置及方法
CN113722483A (zh) * 2021-08-31 2021-11-30 平安银行股份有限公司 话题分类方法、装置、设备及存储介质
CN113722483B (zh) * 2021-08-31 2023-08-22 平安银行股份有限公司 话题分类方法、装置、设备及存储介质
CN114580427A (zh) * 2021-12-29 2022-06-03 北京邮电大学 自媒体用户选择方法及相关设备
CN115659995A (zh) * 2022-12-30 2023-01-31 荣耀终端有限公司 一种文本情感分析方法和装置
CN115659995B (zh) * 2022-12-30 2023-05-23 荣耀终端有限公司 一种文本情感分析方法和装置
CN116402048A (zh) * 2023-06-02 2023-07-07 布比(北京)网络技术有限公司 一种可解释的区块链应用趋势分析方法及系统
CN116402048B (zh) * 2023-06-02 2023-10-10 布比(北京)网络技术有限公司 一种可解释的区块链应用趋势分析方法及系统

Also Published As

Publication number Publication date
CN110442857B (zh) 2024-05-10
CN110442857A (zh) 2019-11-12

Similar Documents

Publication Publication Date Title
WO2020253042A1 (zh) 情感智能判断方法、装置及计算机可读存储介质
CN110222160B (zh) 智能语义文档推荐方法、装置及计算机可读存储介质
Sun et al. Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features
WO2021068339A1 (zh) 文本分类方法、装置及计算机可读存储介质
WO2020082560A1 (zh) 文本关键词提取方法、装置、设备及计算机可读存储介质
CN108959431B (zh) 标签自动生成方法、系统、计算机可读存储介质及设备
Tang et al. Document modeling with gated recurrent neural network for sentiment classification
Akuma et al. Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets
WO2020237856A1 (zh) 基于知识图谱的智能问答方法、装置及计算机存储介质
Tymoshenko et al. Convolutional neural networks vs. convolution kernels: Feature engineering for answer sentence reranking
CN109471944B (zh) 文本分类模型的训练方法、装置及可读存储介质
CN104899322A (zh) 搜索引擎及其实现方法
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
WO2020258481A1 (zh) 个性化文本智能推荐方法、装置及计算机可读存储介质
CN110175221B (zh) 利用词向量结合机器学习的垃圾短信识别方法
WO2020253043A1 (zh) 智能文本分类方法、装置及计算机可读存储介质
WO2021000391A1 (zh) 文本智能化清洗方法、装置及计算机可读存储介质
CN107844533A (zh) 一种智能问答系统及分析方法
WO2021175005A1 (zh) 基于向量的文档检索方法、装置、计算机设备及存储介质
CN114330343B (zh) 词性感知嵌套命名实体识别方法、系统、设备和存储介质
WO2021051934A1 (zh) 基于人工智能的合同关键条款提取方法、装置及存储介质
Mehta et al. Sentiment analysis of tweets using supervised learning algorithms
Roy et al. An ensemble approach for aggression identification in English and Hindi text
CN112395421B (zh) 课程标签的生成方法、装置、计算机设备及介质
WO2020248366A1 (zh) 文本意图智能分类方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933582

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933582

Country of ref document: EP

Kind code of ref document: A1