CN107729320B - Emoticon recommendation method based on time sequence analysis of user session emotion trend - Google Patents

Emoticon recommendation method based on time sequence analysis of user session emotion trend Download PDF

Info

Publication number
CN107729320B
CN107729320B CN201710976797.2A CN201710976797A CN107729320B CN 107729320 B CN107729320 B CN 107729320B CN 201710976797 A CN201710976797 A CN 201710976797A CN 107729320 B CN107729320 B CN 107729320B
Authority
CN
China
Prior art keywords
emotion
user
dictionary
emoticon
conversation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710976797.2A
Other languages
Chinese (zh)
Other versions
CN107729320A (en
Inventor
高岭
周俊鹏
曹瑞
杨旭东
郑杰
杨建峰
高全力
王海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN201710976797.2A priority Critical patent/CN107729320B/en
Publication of CN107729320A publication Critical patent/CN107729320A/en
Application granted granted Critical
Publication of CN107729320B publication Critical patent/CN107729320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

An emoticon recommendation method based on time sequence analysis of user conversation emotion trend analyzes emotion values of conversation by discovering user chat records so as to construct a mapping relation of emoticons in an emotion matrix; analyzing the conversation history record by using an emotion dictionary to calculate emotion keywords; calculating a 21-dimensional emotion vector of the conversation through the emotion keywords and the calculation rule; and performing single-step prediction on the development of the current conversation emotion vector of the user through a time sequence (ARMA/ARIMA) model, and selecting an expression group closest to the emotion trend of the user from the mapping relation through a nearest neighbor algorithm (KNN) according to a prediction result and generating a recommendation list. According to the technical scheme provided by the invention, when the user uses the chat tool, the emoticons conforming to the emotion and conversation context of the current user can be timely and accurately recommended to the user, so that the complex operation of selecting the emoticons by the user is greatly facilitated, the recommendation coverage rate is improved, and the user experience is also enhanced.

Description

Emoticon recommendation method based on time sequence analysis of user session emotion trend
Technical Field
The invention relates to the technical field of intelligent recommendation, in particular to an emoticon recommendation method based on time sequence analysis of user session emotion trends.
Background
The emoticons are the most important way for expressing self emotion and interactive communication besides language and characters in daily internet chatting of people at present, and different emoticons carry different rich meanings. The english language dictionary represents the annual vocabulary published by the oxford dictionary at the end of 2015 using for the first time an emoji emoticon: face with Tears of Joy, official explanation as a 'happy Face' that wins with a Joy. Therefore, the emoticons are very representative as a product meeting the background of rapid and concentrated visual demands in the 21 st century. The pictographic image, such as an emoji emoticon, can supplement emotion blanks in plain language, can inject elegant intonation into characters, and make traffic modes rich and colorful, so that the pictographic image surpasses the limit of language, becomes an individual which can exist independently of the language, and plays a very important role in a network.
As the number of emoticons continues to increase, the user encounters "selection difficulties" with the emoticon. Since big data is widely concerned, the recommendation system is widely accepted and deeply developed and applied as an effective mode capable of relieving information overload. The need for emoticon recommendations is reflected when the number of emoticons available is so many that the user has no choice, when the user is bitter to seek out the appropriate emoticon for a quick reply, and when the emoticon is no longer attractive to the user.
The application recommended by the emoticons can solve the popularization problem, help creators who make the emoticons to obtain greater economic benefits, allow more works to be accepted by users, and meanwhile, better fit the use habits of the users, allow the chatting process to become easier, quicker and more convenient, and enable the chatting to be more personalized.
Currently, the recommendation of the emoticons mainly takes the use frequency of a user as a recommendation basis, and the emoticons which are used most frequently are placed in a recommendation list by counting the daily use amount of the user, however, the mode does not embody any recommendation function, and also limits the popularization of the emoticons; in the Chinese input method, the expression corresponding to the word to be input by the user is mostly predicted through the incomplete pinyin input by the user, or matching recommendation is performed according to the Chinese label corresponding to the emoticon, but in a strict sense, the expressions do not belong to a recommendation algorithm, and only the reproduction of the history record or the expression through the label is realized.
The ultimate goal of the mature recommendation system is to recommend products which are likely to be interested by users who rarely enter the visual field of the users to the users, so that the hunting psychology of the users is met, the potential benefits are maximized, and in the aspect of using the emoticons, most emoticons are finally recommended by using a traditional method based on use frequency, and the ultimate recommendation system cannot adapt to different use requirements of the users. Therefore, the invention fully considers the daily operation experience of the user and the requirements generated by the user, and provides a new emoticon recommendation method which comprises the following steps: analyzing the dialogue history by using an emotion dictionary to calculate emotion keywords, analyzing the emotion change before and after each emoticon is used by a user to calculate an emoticon-emotion value mapping dictionary, analyzing time information, calculating the emotion value of the next time period by using an autoregressive integral moving average model, and finally inquiring and calculating the recommended emoticon from the emoticon dictionary.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide an emoticon recommendation method based on time sequence analysis of user conversation emotion trends, which is used for accurately recommending an emoticon in time when a user uses a chat tool, compensating for language tarnishment and expressing richer emotions, mainly mining potential emotion values existing in a user conversation record, aiming at extracting information units containing emotion contents, converting the information contents into computer-recognizable structured data and dividing basic emotions into: seven categories of ' good (love, worship), bad, happy (happiness), anger, sadness, fear, desire ' and the like ' are quantified, so as to establish an expression symbol-emotion index matrix. And analyzing the user conversation history record by using the emotion dictionary to calculate the emotion key words, thereby analyzing the emotion change before and after the user uses each emoticon, and more accurately calculating the emoticons required in the user chatting process through the emotion change. And then establishing a proper time sequence model, predicting the emotional trend of the current conversation of the user by using the time sequence model, selecting an expression group closest to the emotional trend of the user from the expression symbol-emotion matrix relation, and generating a user recommendation list. Meanwhile, the invention provides an example of the constructed emotion dictionary, and mainly relates to an extended emotion dictionary, a Chinese word auxiliary emotion dictionary and a punctuation mark auxiliary emotion dictionary.
In order to achieve the purpose, the technical method adopted by the invention is as follows:
an emoticon recommendation method based on time sequence analysis of user session emotion trends comprises the following steps:
1) discovering a user chat record to preprocess and analyze the emotion value of the conversation so as to construct the mapping relation of the emoticons in the emotion matrix;
2) analyzing the conversation history record by using an emotion dictionary to calculate emotion keywords; the emotion recognition method comprises the following steps of dividing the emotion dictionary into an emotion dictionary, a tone word emotion dictionary and a punctuation mark emotion dictionary; when an emotion dictionary is used, a forward maximum matching method is considered, words with the same length are divided on a user dictionary and are arranged from long to short, so that the most directly-searchable phrases and words are matched preferentially;
3) calculating a 21-dimensional emotion vector of the conversation through the emotion keywords and the calculation rule;
4) and performing single-step prediction on the development of the current conversation emotion vector of the user through a time sequence model, and selecting an expression group closest to the emotion trend of the user from the mapping relation according to a nearest neighbor algorithm according to a prediction result and generating a recommendation list.
The step 1) of obtaining the chat records of the users and analyzing the emotion values of the conversations comprises the following steps: mining chat record information of users, wherein the chat record information is divided into character information and voice information; and establishing a unique personal emotion dictionary by filtering, segmenting words, stopping word operations and matching with the emotion dictionary by using the information such as the existing chat records of the user and the like, wherein the unique personal emotion dictionary is used for marking the emotion value of the emoticon.
The step 1) of constructing a mapping relation of the emoticons in the emotion matrix comprises the steps of calculating the emotion value of a user to obtain an expression-emotion index calculation matrix; and counting the two-dimensional relation between each expression and the emotion value which can be expressed by the expression. The calculation of the expression symbol-emotion value matrix comprises an expression symbol-emotion value mapping relation which is mainly used for describing the expression form of the expression symbol sent by each user on the emotion value; since not every utterance contains a calculable emotion, in the emotion value calculation process, the k top utterances with emotion whose expressions appear should be extracted, ensuring that the user dictionary for word segmentation and the emotion dictionary for calculating the emotion value have the same vocabulary entry, so as to maximize the dictionary matching effect.
The step 2) of calculating the emotion value of the user session record comprises the following steps: determining the dividing rule of the emotion according to an Ekman dividing method, and expanding 21 subclasses; establishing a reference standard through emotion division, and quantifying the actual emotion expression of specific words; analyzing the conversation historical records by using an emotion dictionary to calculate emotion keywords, wherein the emotion keywords comprise the step of establishing a corresponding emotion dictionary; performing word segmentation and extraction processing on the historical conversation records of the user, and establishing an auxiliary emotion tone word list and an emotion punctuation symbol list for tone words and punctuation symbols;
calculating 21-dimensional emotion vectors of the conversation in the step 3) through emotion keywords and calculation rules, wherein the 21-dimensional emotion vectors include extracting the first n emotional utterances recorded in the conversation of the user, and performing preprocessing such as word segmentation and filtering; and searching and matching the processed sentences in an emotion dictionary, and calculating the total expectation of the emotional tendency of the sentences so as to obtain 21-dimensional emotion vectors corresponding to the expressions.
The step 1) is to preprocess the chatting recording information, arrange the dialogue emotion data according to the time sequence to form a random time sequence, and the formula is as follows:
{Emotion1},i=t1,t2,t3,…,tn
and (4) carrying out data deduplication on the repeated sentences in the session record, and carrying out curve fitting and resampling processing on the incomplete information.
In the step 4), single-step prediction is carried out on the development of the current dialogue emotion vector of the user through a time sequence model, wherein the single-step prediction comprises the steps of extracting a conversation historical record, namely calculating a change rule through historical data of a time dimension, and expanding the change rule to the future so as to predict the change of the object in the future; establishing a time series analysis model, an AR model, an MA model and a combined ARM of the AR model and the MA model, wherein the general formula of ARMA (p, q) is as follows:
Yt=β01Yt-12Yt-2+L+βpYt-pt1εt-1+L+αqεt-q
in the formula, p and q are the autoregressive order and the moving average order of the model; alpha and beta are autoregressive coefficients and moving average coefficients; epsilontIs an error term; y istIs a steady, normal, and zero-mean time series; let the difference operator be
Figure BDA0001438703510000051
For non-stationary sequences { XtD-order difference operation to obtain new sequence
Figure BDA0001438703510000052
Is a stable sequence, if the sequence is assumed to be suitable for ARMA (p, q) model, according to the model algebraic method, the formula of the autoregressive coefficient polynomial equation is as follows:
Figure BDA0001438703510000061
the polynomial formula of the moving average coefficient is as follows:
θ(B)=1-θ1B-θ2B2-…-θqBq
if the data is not stable, after differential processing, an ARIMA (p, d, q) model is used for calculation, and the formula is converted into:
Figure BDA0001438703510000062
where d is the number of differences in the actual smoothing process, but not more than 2, the method for estimating model parameters of ARMA (p, q) adopts a least square estimation method to minimize the sum of squared residuals, and the parameter set is:
δ=(α1,α2,Lαp,β1,β2,Lβq)Tthen, order:
Figure BDA0001438703510000063
the minimum value is reached, and the minimum value,
Figure BDA0001438703510000064
least squares estimation of original parameter set, wherein the variance of white noise
Figure BDA0001438703510000065
The least squares estimate of (d) is:
Figure BDA0001438703510000066
and verifying the stationarity of the data, constructing a time sequence analysis flow model, directly fitting an ARMA (p, q) model for a stationary sequence after the data is subjected to stationarity detection, and fitting an ARIMA (p, d, q) model for a non-stationary sequence after the data is subjected to difference operation and stationarity detection again.
Selecting an expression group closest to the emotion trend of the user in the step 4) and generating an emoticon recommendation list, wherein the emotion trend is used for analyzing the understanding and the use habits of the user on each emoticon through the historical conversation record of the user; and combining the emoticon-emotion mapping table to recommend the next emoticon which is in line with the emotion trend of the user.
Constructing an emotion dictionary in the step 2), wherein a triple is used for representing a vocabulary in an emotion vocabulary body, and info represents the body information of the vocabulary, including number, explanation, corresponding English translation, part of speech and information of a person who enters the vocabulary; the relation represents the direct relation between the vocabulary and the vocabulary, including synonymy relation, antisense relation and the like; emotion represents the emotional information of the vocabulary, which is expressed as:
Lexiconi=(info,relation,emotion)
the emotion information of each word comprises part of speech type, word sense number, emotion classification, strength, polarity, sub-emotion classification, sub-strength, sub-polarity and the like;
the constructed Chinese word emotion dictionary comprises 7 grades of emotion intensity from weak to strong, positive and negative trends are divided into positive and negative trends according to 0 neutrality, 1 positive and negative meanings, 2 negative and 3 positive and negative sexuality, and corresponding calculation optimization is carried out on Chinese words appearing in a conversation system;
the construction of the emotion auxiliary table of the tone words is in accordance with the construction rules of a common dynamic noun emotion dictionary, and each tone word considers three conditions of the tone word, a character/word appearing in the front and a character/word appearing in the back;
the construction of the punctuation emotion dictionary comprises the steps of constructing a punctuation emotion auxiliary table, acquiring a use method, namely an expression effect, according to the ways of inquiring Chinese related documents, dictionaries and the like, and then artificially constructing an influence mode of the punctuation emotion dictionary on an emotion value according to the effect.
The invention has the beneficial effects that:
the recommendation method mainly mines the potential emotion value existing in the user dialogue record, aims to extract the information unit of the emotion content contained in the user dialogue record, converts the information content into the structured data recognizable by a computer, and simultaneously divides the basic emotion into: seven categories of ' good (love, worship), bad, happy (happiness), anger, sadness, fear, desire ' and the like ' are quantified, so as to establish an expression symbol-emotion index matrix. And analyzing the user conversation history record by using the emotion dictionary to calculate the emotion key words, thereby analyzing the emotion change before and after the user uses each emoticon, and more accurately calculating the emoticons required in the user chatting process through the emotion change. And then establishing a proper time sequence model, predicting the emotional trend of the current conversation of the user by using the time sequence model, selecting an expression group closest to the emotional trend of the user from the expression symbol-emotion matrix relation, and generating a user recommendation list. Meanwhile, the invention provides an example of the constructed emotion dictionary, and mainly relates to an extended emotion dictionary, a Chinese word auxiliary emotion dictionary and a punctuation mark auxiliary emotion dictionary. The method and the device help the user to better select the emoticons suitable for the current context under the conversation scene, so that more accurate emoticon recommendation is brought to the user.
Drawings
Fig. 1 is a general block diagram of an emoticon recommendation method according to the present invention.
Fig. 2 is an overall flowchart of an emoticon recommendation method according to the present invention.
FIG. 3 is a flow chart of a value matrix of the present invention.
Fig. 4 is a flowchart of time-series analysis of the emoticon recommendation method of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to the following embodiments.
The invention provides a general framework of an emoticon recommendation method, as shown in fig. 1, the framework specifically comprises the following links:
s11, discovering the chat records of the users, and analyzing the emotion values of the conversations;
s12, constructing a mapping relation of the expression symbol-emotion value matrix and constructing an emotion dictionary;
s13, analyzing the dialogue history record by using an emotion dictionary to calculate emotion keywords;
s14, calculating a 21-dimensional emotion matrix of the conversation according to the emotion keywords;
s15, performing single-step prediction on the emotion of the current conversation of the user by using the time series model;
and S16, generating a recommendation list according to the nearest neighbor in the expression symbol-emotion value matrix according to the single-step prediction result.
According to the technical scheme provided by the invention, when the user uses the chat tool, the emoticons conforming to the emotion and conversation context of the current user can be timely and accurately recommended to the user, so that the complex operation of selecting the emoticons by the user is greatly facilitated, and the user experience is enhanced.
Fig. 2 is an overall flowchart of the emoticon recommendation method provided in the present invention, which mainly includes the following steps:
initializing an emoticon-emotion value matrix, acquiring chat data of a user, and filtering and cleaning the data;
selecting the first k rows of data with the expression symbols according to an extraction rule;
preprocessing the first k pieces of selected data, including filtering, word segmentation and word stop operation;
constructing an emotion dictionary to match word segmentation results, wherein the emotion dictionary comprises an extended emotion dictionary, a Chinese word auxiliary emotion dictionary and a punctuation mark auxiliary emotion dictionary;
calculating a 21-dimensional emotion value vector, and performing preprocessing through time series modeling;
predicting the emotional tendency of the next step by using a time series model;
and (4) making a corresponding recommendation result by inquiring the expression symbol-emotion value matrix, and judging whether the recommendation result is successful or not, if so, updating the expression symbol-emotion value matrix.
Before constructing the emotion dictionary, the emotions need to be divided, for example, in table 1, by dividing the emotions, a reference standard can be effectively established, and simultaneously, the actual emotion expressions of specific words are correspondingly quantized, so that each emoticon and the corresponding emotion value form a one-to-one mapping relationship, and an emoticon-emotion value matrix is conveniently established afterwards.
TABLE 1 Emotion partitioning example
Figure BDA0001438703510000101
The construction of the emotion dictionary not only adds new words which do not appear in the original dictionary, but also establishes two auxiliary emotion tone word lists and emotion punctuation lists aiming at tone words and punctuation marks, and is used for assisting the original dictionary to match calculation results, so that the emotion dictionary is more accurate, and the specific introduction is as follows:
(1) conventional verbs and nouns
A triple is used in the emotional vocabulary body to represent a vocabulary, and info represents the body information of the vocabulary, including number, explanation, corresponding English translation, part of speech and information of a person who enters the vocabulary; the relation represents the direct relation between the vocabulary and the vocabulary, including synonymy relation, antisense relation and the like; emotion represents the emotional information of the vocabulary, and here we mainly focus on and use this piece of content.
Lexiconi=(info,relation,emotion)
The emotion information of each word includes part of speech type, number of word senses, emotion classification, intensity, polarity, sub-emotion classification, sub-intensity, sub-polarity, and the like. On the basis of the original 27466 emotion words, the information input of the newly added extended words adopts the following steps:
for each newly added vocabulary
1. If the vocabulary appears in the original emotional word bank, the vocabulary is not processed;
2. if the vocabulary does not appear in the emotion lexicon, then:
a. searching synonyms of the word in the info, searching each synonym in the original emotion word bank, and if the word is found, classifying the emotion of the word into a newly added word;
b. if the synonym does not exist in the emotion word stock, searching a Chinese explanation corresponding to the English translation in the emotion word stock, and recording the emotion division into the newly added word;
3. the emotional intensity is preliminarily determined by calculating mutual information between the emotional intensity and standard words, and unreasonable calculation results need to be manually adjusted.
(2) Mood word auxiliary emotion dictionary
Simply constructing a Chinese word auxiliary emotion table consisting of common Chinese words according to related data, see table 2 for example, wherein emotion information is expressed according to { emotion symbol | emotion intensity | positive and negative dereflection } format, the emotion intensity is divided into 7 grades from weak to strong, positive and negative surface tendencies are divided according to 0 neutrality, 1 recognition, 2 dereflection and 3 simultaneously with positive and negative bisymity, and corresponding calculation optimization is carried out on the Chinese words appearing in the dialog system.
TABLE 2 construction example of auxiliary emotion table for Chinese and foreign language words
Figure BDA0001438703510000111
Figure BDA0001438703510000121
(3) Punctuation assisted emotion dictionary
The punctuation mark emotion auxiliary table is constructed according to the method of inquiring Chinese related documents, dictionaries and the like to obtain the using method and the expression effect of the punctuation mark emotion auxiliary table, then the influence mode of the punctuation mark emotion auxiliary table on the emotion value is artificially constructed according to the effect, and simultaneously, a calculation rule is constructed, and the set of rules are considered and added in the process of carrying out emotion value calculation so as to optimize the calculation result. A few common methods for multiplexing punctuation marks to form emotion are listed below, and a simple punctuation mark emotion dictionary is constructed, as exemplified in Table 3 below.
TABLE 3 auxiliary presentation of punctuation emotions
Figure BDA0001438703510000122
Figure BDA0001438703510000131
Fig. 3 is a flowchart of an emoticon-emotion value matrix updating method for an emoticon recommendation method provided by the present invention, which is mainly used for describing the expression form of an emoticon sent by each user on an emotion value, and the calculation steps can be simply described as:
expression symbol-emotion value matrix calculation process:
pretreatment:
dividing the corpus according to the usage condition of the emoticons: for each emoticon, the top k records that sent the emoticon are selected as an example for calculating the emoticon sentiment value.
A calculation step:
for each emoticon:
1. performing word segmentation processing on each line of each instance;
2. matching the word segmentation result in the emotion dictionary, and if the word segmentation result is found, recording the result into the 21-dimensional emotion vector of the sentence of the example;
3. searching for the use conditions of the Chinese words and punctuation marks according to the matching rules of the Chinese words and the matching rules of the punctuation marks, and recording the result into the emotion vector of the sentence;
4. accumulating and summing each dimension vector in the example to obtain 21 dimension vectors of the example;
5. and calculating the mean value of all the example vectors of the expression to obtain an expression symbol-emotion value matrix of the expression symbol.
Fig. 4 is a time series analysis flowchart of the emoticon recommendation method provided by the present invention, and the basic idea of the time series analysis method is as follows: the future is predicted through the change history behavior of the things. The change rule of the object is calculated through historical data of a time dimension, and the rule is expanded to the future, so that the change of the object in the future is predicted. The three main models of time series analysis are: AR model (Auto regression), MA model (Moving Average) and ARMA combining the two, wherein the general formula of ARMA (p, q) is as follows:
Yt=β01Yt-12Yt-2+L+βpYt-pt1εt-1+L+αqεt-q
in the formula, p and q are the autoregressive order and the moving average order of the model; alpha and beta are autoregressive coefficients and moving average coefficients; epsilontIs an error term; y istIs a smooth, normal and zero mean time series. Let the difference operator be
Figure BDA0001438703510000141
For non-stationary sequences { XtD-order difference operation to obtain new sequence
Figure BDA0001438703510000142
Is a stable sequence, if the sequence is assumed to be suitable for ARMA (p, q) model, according to the model algebraic method, the formula of the autoregressive coefficient polynomial equation is as follows:
Figure BDA0001438703510000143
the polynomial formula of the moving average coefficient is as follows:
θ(B)=1-θ1B-θ2B2-…-θqBq
if the data is not stable, after differential processing, an ARIMA (p, d, q) model is used for calculation, and the formula is converted into:
Figure BDA0001438703510000144
where d is the number of differences in the actual smoothing process, but generally does not exceed 2.
The estimation method of the model parameters of ARMA (p, q) adopts a least square estimation mode, namely: the set of parameter values that minimizes the sum of squared residuals, set as:
δ=(α1,α2,L αp,β1,β2,Lβq)Tthen, order;
Figure BDA0001438703510000151
the minimum value is reached, and the minimum value,
Figure BDA0001438703510000152
least squares estimation of original parameter set, wherein the variance of white noise
Figure BDA0001438703510000153
The least squares estimate of (d) is:
Figure BDA0001438703510000154
because the stationarity of the data needs to be verified, a time series analysis flow model shown in the attached figure 4 is constructed, and after the stationarity of the data is detected, an ARMA (p, q) model is directly fitted to a stationary sequence; and for the non-stationary sequence, performing difference operation, performing stationarity test again, and finally fitting an ARIMA (p, d, q) model.
The method can improve the frequency-based recommendation effect to a certain extent, and more importantly, the technical method provided by the invention integrates the emotion trend of the user and analyzes the next emotion change of the user by using the time sequence so as to more accurately generate the user expression symbol recommendation list. Besides, the technical methods in the specification are described in a progressive mode, the embodiments of the modules are closely related, and the key technical methods in the claims are described in detail in the specification.

Claims (9)

1. An emoticon recommendation method based on time sequence analysis of user session emotion trends is characterized by comprising the following steps:
1) discovering a user chat record to preprocess and analyze the emotion value of the conversation so as to construct the mapping relation of the emoticons in the emotion matrix;
2) analyzing the conversation history record by using an emotion dictionary to calculate emotion keywords; the emotion dictionary is divided into an emotion dictionary, a tone word emotion dictionary and a punctuation mark emotion dictionary; when the emotion dictionary is used for word matching and emotion value calculation, a forward maximum matching method is used, specifically, words with the same length are divided on a user dictionary and are arranged from long to short, so that the most directly searchable phrases and words are preferentially matched;
3) calculating a 21-dimensional emotion vector of the conversation through the emotion keywords and the calculation rule;
4) and performing single-step prediction on the development of the current conversation emotion vector of the user through a time sequence model, and selecting an expression group of the emotion trend of the user from the mapping relation according to the prediction result through a nearest neighbor algorithm to generate a recommendation list.
2. The emoticon recommendation method based on time-series analysis of user session emotional trends according to claim 1, wherein the mining user chat records described in step 1) are preprocessed, and the specific process comprises: primarily classifying the chat record information of the user into text information and voice information; and performing symbol filtering, word segmentation and word stop operation on the text part information, and establishing a unique personal emotion dictionary matched with the emotion dictionary for marking the emotion value of the emoticon.
3. The emoticon recommendation method based on time-series analysis of the emotional tendency of the user session according to claim 1, wherein the step 1) of constructing the mapping relationship of the emoticon in the emotion matrix comprises the steps of calculating the emotion value of the user to obtain an emotion-emotion index calculation matrix; the matrix can count the two-dimensional relationship between each expression and the emotion value which can be expressed by the expression, and is mainly used for describing the expression form of the emoticon sent by each user on the emotion value; in the emotion value calculation process, the first k utterances with emotions are extracted, and the user dictionary for word segmentation and the emotion dictionary for emotion value calculation are ensured to have the same entries, so that the dictionary matching effect is maximized.
4. The emoticon recommendation method according to claim 1, wherein the step 3) of calculating a 21-dimensional emotion vector of the conversation by emotion keywords and a calculation rule comprises: determining the dividing rule of the emotion according to an Ekman dividing method, and expanding 21 subclasses; establishing a reference standard through emotion division, and quantifying the actual emotion expression of specific words; analyzing the conversation historical records by using an emotion dictionary to calculate emotion keywords, wherein the emotion keywords comprise the step of establishing a corresponding emotion dictionary; and performing word segmentation and extraction processing on the historical conversation records of the user, and establishing an auxiliary emotion tone word list and an emotion punctuation symbol list for tone words and punctuation symbols.
5. The emoticon recommendation method based on time-series analysis of emotion tendencies of a user conversation as claimed in claim 1, wherein the 21-dimensional emotion vector of the conversation is calculated in step 3) through emotion keywords and calculation rules, including extracting the first n emotional utterances recorded in the user conversation, performing word segmentation, removing redundant spaces, filtering stop words, and removing other symbols which do not appear in the emotion dictionary; and searching and matching the processed sentences in an emotion dictionary, and calculating the total expectation of the emotional tendency of the sentences so as to obtain 21-dimensional emotion vectors corresponding to the expressions.
6. The emoticon recommendation method based on time-series analysis of user session emotion tendency according to claim 1, wherein the current dialog emotion vector of the user involved in step 4) is a random time sequence formed by arranging the dialog emotion data of the user in time sequence, and the formula is as follows:
{Emotioni},i=t1,t2,t3,…,tn
and (4) carrying out data deduplication on the repeated sentences in the session record, and carrying out curve fitting and resampling processing on the incomplete information.
7. The emoticon recommendation method based on time-series analysis of user session emotion tendency as claimed in claim 1, wherein in step 4), the development of the current dialog emotion vector of the user is predicted in a single step through a time series model, including extracting historical dialog records, that is, calculating the change rule through historical data of time dimension, and expanding the rule to the future, so as to predict the emotion change of the future dialog; establishing a time sequence analysis model, an AR model, an MA model and a combined ARMA model of the AR model and the MA model, wherein the formula of the ARMA (p, q) model is as follows:
Yt=β01Yt-12Yt-2pYt-pt1εt
in the formula, p and q of ARMA (p and q) are the autoregressive order and the moving average order of the model;
let the difference operator be
Figure FDA0002962280670000031
For non-stationary sequences { XtD-order difference operation to obtain new sequence
Figure FDA0002962280670000032
Is a stationary sequence, if the sequence is assumed to fit the ARMA (p, q) model, wherein, according to the model algebraic approach, the formula of the autoregressive coefficient polynomial equation is:
Figure FDA0002962280670000033
the polynomial formula of the moving average coefficient is as follows:
θ(B)=1-θ1B-θ2B2-…-θqBq
if the data is not stable, after differential processing, an ARIMA (p, d, q) model is used for calculation, and the formula is converted into:
Figure FDA0002962280670000041
wherein d is the number of difference in the actual smoothing process, but not more than 2, the method for estimating the model parameters of the ARIMA (p, d, q) adopts a least square estimation mode to make the residual sum of squares reach the minimum parameter value, and the parameter set is as follows:
δ=(α1,α2,L αp,β1,β2,L βq)Tthen, order:
Figure FDA0002962280670000042
the minimum value is reached, and the minimum value,
Figure FDA0002962280670000043
variance of white noise for least squares estimation of original parameter set
Figure FDA0002962280670000044
The least squares estimate of (d) is:
Figure FDA0002962280670000045
and verifying the stationarity of the data, constructing a time sequence analysis flow model, directly fitting an ARMA (p, q) model for a stationary sequence after the data is subjected to stationarity detection, and fitting an ARIMA (p, d, q) model for a non-stationary sequence after the data is subjected to difference operation and stationarity detection again.
8. The emoticon recommendation method based on time-series analysis of emotional trends of users according to claim 1, wherein the emoticon recommendation method in step 4) selects k emoticon groups of emotional trends of users by using a nearest neighbor algorithm and generates an emoticon recommendation list, and comprises analyzing understanding and using habits of users for each emoticon by using emotional trends through a user history conversation record; and combining the emoticon-emotion mapping table to recommend the next emoticon which is in line with the emotion trend of the user.
9. The emoticon recommendation method based on time-series analysis of the emotion tendency of the user session as claimed in claim 1, wherein the emotion matrix is constructed in step 3), and comprises a Lexicon representation triple emotion vocabulary body, and info representation vocabulary body information, including number, explanation, corresponding English translation, part of speech and input person information; the relationship represents the direct relationship between the vocabulary and the vocabulary, including synonymy relationship and antisense relationship; emotion represents the emotional information of the vocabulary, which is expressed as:
Lexiconi=(info,relation,emotion)
the emotion information of each word comprises part of speech type, word sense number, emotion classification, strength, polarity, sub-emotion classification, sub-strength and sub-polarity;
the emotion dictionary of the Chinese words in the step 2) comprises 7 grades of emotion intensity from weak to strong, positive and negative surface tendencies are divided according to 0 neutrality, 1 commendability, 2 deresination and 3 with both commendability and deresicity, and corresponding calculation optimization is carried out on the Chinese words appearing in the dialog system;
constructing a punctuation emotion dictionary in the step 2), obtaining a using method and an expression effect of the punctuation emotion dictionary by inquiring a punctuation emotion auxiliary table, Chinese related documents and dictionary modes, and then artificially constructing an influence mode of the punctuation emotion dictionary on an emotion value according to the effect.
CN201710976797.2A 2017-10-19 2017-10-19 Emoticon recommendation method based on time sequence analysis of user session emotion trend Active CN107729320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710976797.2A CN107729320B (en) 2017-10-19 2017-10-19 Emoticon recommendation method based on time sequence analysis of user session emotion trend

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710976797.2A CN107729320B (en) 2017-10-19 2017-10-19 Emoticon recommendation method based on time sequence analysis of user session emotion trend

Publications (2)

Publication Number Publication Date
CN107729320A CN107729320A (en) 2018-02-23
CN107729320B true CN107729320B (en) 2021-04-13

Family

ID=61212056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710976797.2A Active CN107729320B (en) 2017-10-19 2017-10-19 Emoticon recommendation method based on time sequence analysis of user session emotion trend

Country Status (1)

Country Link
CN (1) CN107729320B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520046B (en) * 2018-03-30 2022-04-22 上海掌门科技有限公司 Method and device for searching chat records
CN108733838B (en) * 2018-05-29 2021-04-23 东北电力大学 User behavior prediction system and method based on multi-polar emotion analysis
CN109325112B (en) * 2018-06-27 2019-08-20 北京大学 A kind of across language sentiment analysis method and apparatus based on emoji
CN110895558B (en) * 2018-08-23 2024-01-30 北京搜狗科技发展有限公司 Dialogue reply method and related device
CN109145306A (en) * 2018-09-11 2019-01-04 刘瑞军 The three-dimensional expression generation method of text-driven
CN111190493A (en) * 2018-11-15 2020-05-22 中兴通讯股份有限公司 Expression input method, device, equipment and storage medium
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109977409A (en) * 2019-03-28 2019-07-05 北京科技大学 A kind of intelligent expression recommended method and system based on user's chat habit
CN111897990A (en) * 2019-05-06 2020-11-06 阿里巴巴集团控股有限公司 Method, device and system for acquiring expression information
CN110189742B (en) * 2019-05-30 2021-10-08 芋头科技(杭州)有限公司 Method and related device for determining emotion audio frequency, emotion display and text-to-speech
CN110619073B (en) * 2019-08-30 2022-04-22 北京影谱科技股份有限公司 Method and device for constructing video subtitle network expression dictionary based on Apriori algorithm
CN110717109B (en) * 2019-09-30 2024-03-15 北京达佳互联信息技术有限公司 Method, device, electronic equipment and storage medium for recommending data
CN113360615B (en) * 2021-06-02 2024-03-08 首都师范大学 Dialogue recommendation method, system and equipment based on knowledge graph and time sequence characteristics
CN113360003B (en) * 2021-06-30 2023-12-05 北京海纳数聚科技有限公司 Intelligent text input method association method based on dynamic session scene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618222A (en) * 2015-01-07 2015-05-13 腾讯科技(深圳)有限公司 Method and device for matching expression image
KR20160056994A (en) * 2014-11-12 2016-05-23 한양대학교 산학협력단 Method for Recommending Emoticon and User Device for Recommending Emoticon
CN105975563A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Facial expression recommendation method and apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160056994A (en) * 2014-11-12 2016-05-23 한양대학교 산학협력단 Method for Recommending Emoticon and User Device for Recommending Emoticon
CN104618222A (en) * 2015-01-07 2015-05-13 腾讯科技(深圳)有限公司 Method and device for matching expression image
CN105975563A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Facial expression recommendation method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于时序分析的移动用户情感预测研究与应用;曹瑞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180415(第04期);论文第1-83页 *

Also Published As

Publication number Publication date
CN107729320A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN107729320B (en) Emoticon recommendation method based on time sequence analysis of user session emotion trend
CN111767741B (en) Text emotion analysis method based on deep learning and TFIDF algorithm
CN106997375B (en) Customer service reply recommendation method based on deep learning
CN107862087B (en) Emotion analysis method and device based on big data and deep learning and storage medium
Macary et al. On the use of self-supervised pre-trained acoustic and linguistic features for continuous speech emotion recognition
CN109509470B (en) Voice interaction method and device, computer readable storage medium and terminal equipment
KR101634086B1 (en) Method and computer system of analyzing communication situation based on emotion information
JP2007018234A (en) Automatic feeling-expression word and phrase dictionary generating method and device, and automatic feeling-level evaluation value giving method and device
Ezzat et al. Sentiment analysis of call centre audio conversations using text classification
JP5506738B2 (en) Angry emotion estimation device, anger emotion estimation method and program thereof
CN110348024A (en) Intelligent identifying system based on legal knowledge map
CN110543547A (en) automobile public praise semantic emotion analysis system
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Leini et al. Study on speech recognition method of artificial intelligence deep learning
CN115730203A (en) Voice emotion recognition method based on global perception cross-modal feature fusion network
CN110297906B (en) Method for generating interview report, computer-readable storage medium and terminal device
KR102109866B1 (en) System and Method for Expansion Chatting Corpus Based on Similarity Measure Using Utterance Embedding by CNN
CN114418327A (en) Automatic order recording and intelligent order dispatching method for customer service system
Zorrilla et al. Audio Embedding-Aware Dialogue Policy Learning
CN113673239A (en) Hotel comment emotion polarity classification method based on emotion dictionary weighting
CN108962281B (en) Language expression evaluation and auxiliary method and device
CN111159383A (en) Legal opinion book generation system based on target object
CN115292495A (en) Emotion analysis method and device, electronic equipment and storage medium
CN109298796B (en) Word association method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant