CN107193969B - Method for automatically generating novel text emotion curve and predicting recommendation - Google Patents
Method for automatically generating novel text emotion curve and predicting recommendation Download PDFInfo
- Publication number
- CN107193969B CN107193969B CN201710377512.3A CN201710377512A CN107193969B CN 107193969 B CN107193969 B CN 107193969B CN 201710377512 A CN201710377512 A CN 201710377512A CN 107193969 B CN107193969 B CN 107193969B
- Authority
- CN
- China
- Prior art keywords
- text
- emotion
- novel
- matrix
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for automatically generating a novel text emotion curve and predicting recommendation, wherein the emotion curve generated by the method can more accurately reflect the text emotion change condition; the method creatively predicts the statistics of the novel by using the relation between the emotion curves of the novel text, and the provided prediction of the download amount also has higher positive correlation; the invention also provides a new angle for the recommendation of the related novel text for the related recommendation task. The method mainly comprises the following steps: preprocessing a novel text corpus to obtain a word list of novel, and sequentially calculating emotion scores through a text window to assemble into an emotion curve; calculating a dynamic time regular distance matrix between every two texts through the emotion curve; providing prediction of download amount by using an improved Gaussian process for the dynamic time warping distance matrix; and giving out related text recommendation according to the dynamic time warping distance.
Description
Technical Field
The invention belongs to an emotion analysis neighborhood in computer natural language processing, and relates to a method for automatically generating a novel text emotion curve and predicting recommendation.
Background
Psychological studies have shown that people tend to feel better with those stories having familiar patterns, and dislike those story lines that are contrary to their own experience. Kurt Vonnegut considers that the emotional curve of the story is the core embodiment of the reading value of the novel; good novels tend to have similar patterns of emotional variation. In order to better analyze the emotional change of the novel text, an emotional curve of the novel text needs to be generated and relevant comparative analysis needs to be carried out.
The problem of emotional curve generation for novel text is still in the exploration phase. Although there are various emotion analysis evaluation methods for paragraphs and short texts, a relatively general combination of text sampling and emotion dictionary mapping is basically used for the task of generating emotion curves, i.e., time series of emotion scores.
In the related art in the past, the problem of how to further assist the text analysis by comparing the emotion curves of novels is not concerned; related work only qualitatively analyzes the sentiment curve of the novel, and methods such as Manhattan Distance (Manhattan Distance) calculation on a fixed-length novel curve (namely, a time sequence with the same time resolution) are adopted for convenient analysis in the work. In fact, methods in terms of time series analysis should be used for such data with a specific individual "time axis".
Traditionally, there is no uniform and good method for the distance-class correlation regression analysis task, and especially for the distance between curves, the distance is different from the distance in the traditional Euclidean space. While these distance metrics may better reflect the relationship between time series, certain metrics may not satisfy the triangle inequality, making such metrics unable to easily apply traditional machine learning methods.
Disclosure of Invention
The purpose of the invention is as follows: the invention mainly aims at the problem that the overall emotion change characteristics of a novel text are not considered in the existing novel text analysis, and provides a method which can comprehensively examine the emotion change similarities and differences among different texts and can give prediction and recommendation of relevant statistics of the novel through a machine learning process.
In order to solve the technical problem, the invention discloses a method for automatically generating a novel text emotion curve and predicting recommendation. All steps of the method run on a Windows platform, a curve is generated for a novel text data set from a Gordburg plan (www.gutenberg.org), and downloading amount prediction and recommendation are given.
The python spaCy toolkit (space. io) used in the present invention is an open source toolkit for natural language processing written by the "expansion AI" organization (twitter.
The labMT emotion vocabulary (neo. imm.dtu.dk/wiki/LabMT) used in the present invention is a supplementary material provided by Peter Shendan Dodds et al in their paper (arxiv.org/abs/1101.5120v 3). The labMT emotion vocabulary is taken from a wide data set, and public emotion scores of main words are obtained by using crowdsourcing service; more than 50 independent evaluations are obtained for each word, so the word emotion scores of the labMT emotion dictionary are extensive and objective.
The specific implementation steps of the conventional Gaussian process in the present invention are prior art and are described in detail in the book Gaussian Processes for Machine Learning (MIT press, 2006) of c.e. rasmussen et al.
The technology related to calculating the dynamic time warping distance of two time series in the invention is the existing technology from the time series analysis field, but the reference in the traditional natural language processing neighborhood is less. The method is mainly introduced into the problem of newly calculating the emotion curves of novel texts, and in the subsequent steps of practical application, the distance matrix generated by the technology is corrected so as to meet the requirements of a subsequent model.
The method mainly comprises the following steps:
step 1, generating an emotional curve of the novel from the novel text.
And 2, calculating a dynamic regular distance matrix between every two emotion curves obtained in the step 1.
And 3, forecasting the downloading amount by utilizing the dynamic regular distance matrix obtained in the step 2 through an improved Gaussian process.
And 4, sorting the corresponding novel texts from small to large according to the distance by using the dynamic regular distance obtained in the step 2, and outputting the novel titles closest to the distance as recommendations.
The step 1 of the invention comprises the following steps:
step 1-1, segmenting the training text and the target text of the novel by using a python natural language processing toolkit spaCy, and removing elements which do not influence the number of effective words of the text, such as punctuation marks and person appellations (such as Mr, Mrs and the like), so as to obtain a word list of the text.
And 1-2, sequentially dividing a word list of the text into word windows, and sequentially calculating the average emotion score of each word window.
And 1-3, sequentially arranging the emotion scores obtained in the step 1-2 to generate a group of time sequences of emotion scores, and calculating a moving average sequence of the time sequences. And the finally obtained moving average sequence is used as the sentiment curve of the novel.
The steps 1-2 of the invention comprise the following steps:
and 1-2-1, equally dividing the word list of the text into text windows according to the size Nw of the word window.
Step 1-2-2, obtaining an emotion score mapping table of common words through a labMT emotion vocabulary table, wherein the form is a mapping function h from the words to emotion scoresavg(w)。
Step 1-2-3, counting words appearing in an emotion score mapping table in a text window and frequency of the appearance of the words;
step 1-2-4, calculating the emotion score h of each text window T by the following formulaavg(T):
Wherein, the words appearing in the emotion score mapping table in the window are respectively w1,w2,…,wNThe total number of words in the table in which the window appears is N, the ith word wiCorresponding sentiment score of havg(wi) I th word wiThe corresponding frequency number in the text window T is fi(T), i ranges from 1 to N.
The step 1-2-1 comprises the following steps:
step 1-2-1-1, aiming at a word list of a text and a size N of a text window needing to be generatedwCalculating the number L of text windows to be divided as L/NwWhere L is the total length of the word list of the text;
step 1-2-1-2, calculating the starting position T of each text window according to the following formulabjAnd an end bitPut Tej:
Tbj=Nw×j+1,
Tej=Nw×(j+1),
Wherein j is 1 … l;
and 1-2-1-3, sequentially generating the segmented text windows according to the starting position and the ending position of each text window in the single text list.
The step 2 comprises the following steps:
step 2-1, aiming at pairwise matching of all novel texts, sequentially selecting two time sequences s corresponding to emotion scores of novel emotion curves1…snAnd t1…tm,snRepresenting a time series s1…snN-th element of (1), tmRepresenting a time series t1…tmThe mth element, n and m are natural numbers, and the window size is set to be w1,
Step 2-2, presetting a matrix DTW with the size of nxm, wherein the direction of the matrix is from bottom to top and then from left to right, the DTW is English shorthand of dynamic time warping (dynamic time warping), the value DTW [0,0] of the leftmost lower corner of the matrix is 0, and all other values are positive infinity;
step 2-3, sequentially inspecting matrix elements positioned in indexes a and b according to the sequence from bottom to top and from left to right; the first row and the first column of the matrix are not considered, if the difference between a and b is larger than w1Also, a is in the range of 1 to n, and b is in the range of 1 to m. Taking the minimum value from the left, lower and lower left matrix elements adjacent to the matrix element, and adding the corresponding element s of the time sequencea,tbThe value of the matrix element currently under investigation is replaced by this new value;
step 2-4, returning a value DTW [ n, m ] of the uppermost right corner of the DTW matrix as a dynamic time regular distance between two target emotion score time sequences;
and 2-5, repeating the steps 2-1 to 2-4 until the dynamic time warping distances between every two texts are obtained, and arranging the dynamic time warping distances into a dynamic time warping distance matrix.
Step 3 of the invention comprises the following steps:
and 3-1, logarithm is taken for the download quantity data of the training text to obtain the logarithm download quantity y of the training data.
Step 3-2, calculating the minimum eigenvalue lambda of the dynamic regular distance matrix K generated by the training textmin。
Step 3-3, inputting noise levelUsing lambda obtained in step 3-2minAnd (3) correcting the same: if λmin>0, no change is made; if λmin<0, then noise levelIt is also necessary to add-lambdamin。
Step 3-4, for the corrected noise levelAnd calculating by using the traditional Gaussian process to give prediction of the download amount of the novel.
The step 3-4 comprises the following steps:
step 3-4-1, inputting a dynamic regular distance matrix K, a logarithm download quantity y of a training data target value novel text and a corrected noise levelAnd a dynamic warping distance matrix k from the target to the training data*;
Step 3-4-2, calculating a matrixCholesky decomposition matrix L of1Wherein I represents an identity matrix;
step 3-4-3, calculating kernel function k*Coefficient matrix α:
α=L1 T\(L1\y),
the operation symbol A \ B represents solving the linear equation AX ═ X in B;
step 3-4-4, calculating the target logarithmic download quantity f*:
f*Namely the predicted value of the download amount.
The invention solves the problems of information loss and redundancy easily caused by the limitation of technology and subsequent purposes when the emotion curve is generated by sampling the text by generating the emotion curve which can adapt to the length of the text. Therefore, the method for generating the curve can reflect the emotional change of the novel text more accurately. And the accuracy can be verified in subsequent tasks.
The invention solves the problem of applying the dynamic time warping distance to the relevant statistical quantity prediction by means of a modified gaussian process in step 3. The actual modifications made here, while appearing to be a simpler procedure, have been subject to strict theoretical proof and experimental verification. Theoretically, it can be proved that the correction method provided in step 3 can ensure that the given matrix is definite, thereby ensuring the usability of the kernel function and solving the problem of the positive nature of applying the dynamic regularized distance to the gaussian process and even the general kernel method. 1000 groups of simulation experiments show that even for random data, the ratio of negative characteristic values contained in the dynamic regular distance matrix does not exceed 5%; in the case of negative eigenvalues, the ratio of the maximum positive eigenvalue to the minimum negative eigenvalue modulo is also both higher than 25; this means that the method does not have a large impact on the original distance characteristics, while ensuring usability. Moreover, the improvement can be perfectly integrated into the frame of the original Gaussian process, and the method is convenient.
The invention pioneers quantitative prediction of novel relevant statistics by exploiting the relationship between the emotion curves of novel text. Specifically, the topological structure of the novel text emotion curve is described by introducing dynamic regular distance in time series analysis, and regression analysis is performed on novel download quantity by utilizing an improved Gaussian process.
The improved gaussian process used in the method disclosed by the present invention solves the problems of the prior art. The improvement is proved to be reasonable through theoretical verification and feasibility through experiments, is simple and easy to implement, and can be perfectly fused into an original Gaussian process framework.
Has the advantages that: the method and the device provide beneficial reference for analyzing the emotion change trend of the novel by generating the emotion curve of the given novel text. The improved Gaussian process included in the method disclosed by the invention can accept a wider distance function as a kernel function, so that the application range of the Gaussian process is expanded, and the accuracy is indirectly improved for the related prediction. The method utilizes the relation between the emotion curves of the novel text to predict the download amount of the target text, is a completely innovative method, and has stronger positive correlation compared with the prediction given by the traditional method. The invention provides another brand-new angle for the relevance recommendation of the novel through the comparison of the dynamic time warping distance.
Drawings
The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is an illustration of the invention generating an emotion curve.
FIG. 3 is a prior art generation of an emotion curve.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
As shown in FIG. 1, the invention discloses a method for automatically generating an emotion curve of a novel text and giving a novel recommendation which is most similar to the emotion curve and is predicted by the downloading amount. The method mainly comprises the following steps:
and 11, segmenting the training text and the target text of the novel by using a python natural language processing tool kit spaCy, removing punctuation marks through spaCy labeling, and removing character title acronyms (such as Mr, Mrs and the like) through a text template matching mode to obtain a word list of the text.
And step 12, equally dividing the word list of the text into text windows according to the size Nw of the text windows.
Step 13, obtaining the emotion score mapping table of the common words through the labMT emotion vocabulary table, wherein the form is a mapping function h from the words to the emotion scoresavg(w)。
And step 14, counting words appearing in the emotion score mapping table in the text window and the frequency of the appearance of the words.
Step 15, calculating the emotion score of each text window T, wherein the formula is as follows:
wherein, the words with windows appearing in the emotion score mapping table are w respectively1,w2,…,wNThe total number of words whose window appears in the table is N, the word wiCorresponding sentiment score of havg(wi) Word wiThe corresponding frequency count in the text window T is fi(T)。
And step 16, sequentially arranging the emotion scores of the windows to generate a group of time sequences of emotion scores.
And step 17, calculating the moving average sequence of the emotion score time sequence obtained in the step 16, namely replacing the emotion score of each point of the original emotion score time sequence with the average value of the emotion scores of the adjacent points of the point. The moving average sequence is the emotion curve as a novel.
And step 18, calculating a dynamic regular distance matrix between every two emotion curves.
And step 19, logarithm is taken on the data of the download amount of the training text to obtain a predicted value y of the training data.
Step 21, inputting noiseLevel ofUsing lambda obtained in step 3-2minAnd (3) correcting the same: if λmin>0, no change is made; if λmin<0, then noise levelIt is also necessary to add-lambdamin。
Step 22, for the corrected noise levelAnd calculating by using the traditional Gaussian process to give prediction of the download amount of the novel.
And step 23, sorting the corresponding novel texts from small to large according to the dynamic regular distance matrix, and outputting the novel titles with the closest distances as recommendations.
Step 12 of the present invention comprises the steps of:
step 24: word list for text and size N of text window to be generatedwCalculating the number L of text windows to be divided as L/NwWhere L is the total length of the word list of the text.
Step 25: the start position T of each text window is calculated according to the following formulabjAnd an end position Tej:
Tbj=Nw×j+1,
Tej=Nw×(j+1),
Wherein j is 1 … l;
step 26: sequentially generating segmented text windows according to the starting position and the ending position of each window in the text single list
Step 18 of the present invention comprises the steps of:
step 27, aiming at pairwise matching of all novel texts, sequentially selecting two time sequences s corresponding to emotion scores of novel emotion curves1…sn,t1…tm,snRepresenting a time series s1…snN-th element of (1), tmRepresenting a time series t1…tmThe mth element, n and m are natural numbers, and the window size is set to be w1,
Step 28, a matrix DTW with a size of n × m is preset, and the direction of the matrix is from bottom to top and then from left to right, where the DTW is an english abbreviation of dynamic time warping (dynamic time warping), a value DTW [0,0] at the leftmost lower corner of the matrix is 0, and all other values are positive infinity.
Step 29, sequentially inspecting matrix elements positioned in the indexes a and b according to the sequence from bottom to top and from left to right; the first row and the first column of the matrix are not considered, if the difference between a and b is larger than w1Also, a is in the range of 1 to n, and b is in the range of 1 to m. Taking the minimum value from the left, lower and lower left matrix elements adjacent to the matrix element, and adding the corresponding element s of the time sequencea,tbThe value of the matrix element currently under consideration is replaced by this new value.
And step 30, returning the value DTW [ n, m ] of the uppermost right corner of the DTW matrix as the dynamic time regular distance between the two target emotion score time sequences.
And 31, repeating the steps 27-30 until the dynamic time warping distance between every two texts is obtained. The dynamic time warping distances are arranged into a matrix of dynamic time warping distances.
The step 22 of the present invention comprises the steps of:
step 32, inputting a dynamic regular distance matrix K (namely a Gaussian process kernel function), a logarithm download quantity y of a training data target value novel text and a noise levelDynamic warping distance matrix k from target to training data*;
Step 33, calculate the matrixCholesky decomposition matrix L of1Wherein I represents an identity matrix.
Step 34, calculating kernel function k*Coefficient matrix α:
α=L1 T\(L1\y),
the operation symbol a \ B represents X in solving the linear equation AX ═ B.
Step 35, calculating the target logarithmic download quantity f*:
Examples
The algorithm used by the invention is completely written and realized by Python language. The experimental configuration was an Intel (R) core (TM) i5-4200M processor with a primary frequency of 2.5G HZ, memory of 4G, Python version 3.5.3, release Anaconda 3.
Experimental data were prepared as follows: 1729 English novel texts from the Gutenberg plan, wherein the total number of text words is over 10000, and the monthly capacity of the texts is over 100; fiction related statistics obtained through the gurdenburg plan website: including the name of the novel, the amount downloaded.
Example 1
The emotion curve experiment in the embodiment for generating the novel text is as follows:
11. and inputting a training text corpus and a testing text corpus, and preprocessing to obtain a text word list.
12. And (4) generating an emotion curve of the text by using the word list obtained in the step (11), and generating a compared emotion curve as comparison according to a previous method.
Example 2
In the embodiment, the prediction experiment of the download amount is given by comparing the emotion curves of the novel texts as follows:
11. and inputting a training text corpus and a testing text corpus, and preprocessing to obtain a text word list.
12. And generating an emotion curve of the text by using the word list obtained in the step 11.
13. And calculating an emotional curve dynamic time regular distance matrix.
14. The logarithmic download amount of the test text is given by the distance matrix and the improved gaussian process.
Example 3
In the embodiment, the recommendation experiment of the relevant text given by comparing the emotion curves of the novel text is as follows:
11. and inputting a training text corpus and a testing text corpus, and preprocessing to obtain a text word list.
12. And generating an emotion curve of the text by using the word list obtained in the step 11.
13. And calculating an emotional curve dynamic time regular distance matrix.
14. And sequencing the related texts through a dynamic time warping distance matrix and recommending according to the distance from small to large.
The invention aims to improve an emotion curve generation method of a novel text and make relevant prediction recommendation, and needs to provide a method capable of accurately reflecting emotion change characteristics of an original text and improving positive correlation of prediction downloading quantity. In order to verify the effectiveness of the invention, the invention is compared with the traditional method for generating the emotion curve and a plurality of traditional models.
The emotion curves generated by the present invention are shown in fig. 2, and the emotion curves generated by the conventional method are shown in fig. 3, in which the vertical axes of the two graphs represent emotion scores of novel texts, and the horizontal axis represents positions of corresponding sampling windows in the texts (that is, time points of emotion score time series). Both figures generate the emotional curves of the novel < Alice's adventure in Wonderland >. It can be seen that although the present invention uses fewer sampling windows (temporal resolution), it better embodies the emotional variations of the novel text. Taking Alice dream travel wonder as an example, after 80% of text, the original text is in a state of sharp change, a great amount of negative emotions are expressed in the trial judgment of king and queen, and then the dream is awakened to be calm. The invention well represents the emotional change characteristics of the text; and the conventional method can only see the situation of the text emotion regression mean. Meanwhile, it is noted that the accuracy improvement of the emotion curve of the novel text is in causal relation and consistent with the improvement of the correlation coefficient predicted by a subsequent model, and the accurate drawing of the emotion curve of the novel text is to better improve the objective data of the prediction downloading amount.
Table 1 is a comparison of the prediction of the amount of downloaded target text given by the modified gaussian process:
TABLE 1
The outcome data of the invention is in the last row. Compared with the traditional plain text characteristic and the curve generation method given by the predecessor, the prediction result of the download amount has higher positive correlation.
Table 2 shows an example of recommendation results according to the emotional curve similarity:
TABLE 2
It can be seen that in table 2, for the invention that makes recommendations only by means of emotional curves, another revised text representation of the original novel successfully given by the invention is taken as the closest recommended novel, which illustrates the rationality of the invention in recommendation by means of emotional curves; also, table 2 gives a summary of the closer proximity in the emotional curves, illustrating the utility of the invention.
The invention provides a method for automatically generating a novel text emotion curve and predicting recommendation, and the emotion curve generated by the method can more accurately reflect the text emotion change condition. The download prediction method provided by the invention is an innovative method, has different points from the prior art, and focuses on utilizing the overall emotional change of the novel text; compared with the traditional text feature method, the method can obtain higher positive correlation when the actual download quantity of the independent new text is predicted. The emotion curve recommended by the invention is closest to the novel text, and has reasonability and uniqueness, thereby providing a brand new angle for the recommendation task related to the novel text.
The present invention provides a method for automatically generating a novel text emotion curve and predicting a recommendation, and a method and a way for implementing the method are many, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (5)
1. A method for automatically generating a novel text emotion curve and predicting recommendation is characterized by comprising the following steps:
step 1, generating an emotional curve of a novel from a novel text;
step 2, calculating a dynamic regular distance matrix between every two emotion curves obtained in the step 1;
step 3, forecasting the downloading amount of the target text through an improved Gaussian process by using the dynamic regular distance matrix obtained in the step 2;
step 4, sorting the corresponding novel texts from small to large according to the distance by using the dynamic regular distance obtained in the step 2, and outputting a novel title closest to the distance as a recommendation;
the step 1 comprises the following steps:
1-1, segmenting a training text and a target text of a novel by using a python natural language processing kit spaCy to obtain a word list of the text;
step 1-2, sequentially dividing a word list of a text into word windows, and sequentially calculating the average emotion score of each word window;
step 1-3, sequentially arranging the emotion scores obtained in the step 1-2 to generate a group of time sequences of the emotion scores, calculating a moving average sequence of the time sequences, and taking the obtained moving average sequence as an emotion curve of the novel;
the step 1-2 comprises the following steps:
1-2-1, equally dividing a word list of a text into text windows according to the size Nw of the text windows;
step 1-2-2, obtaining an emotion score mapping table of common words through a labMT emotion vocabulary table, wherein the form is a mapping function h from the words to emotion scoresavg(w);
Step 1-2-3, counting words appearing in an emotion score mapping table in a text window and frequency of the appearance of the words;
step 1-2-4, calculating the emotion score h of each text window T by the following formulaavg(T):
Wherein, the words appearing in the emotion score mapping table in the window are respectively w1,w2,…,wNThe total number of words in the table in which the window appears is N, the ith word wiCorresponding sentiment score of havg(wi) I th word wiThe corresponding frequency number in the text window T is fi(T), i ranges from 1 to N.
2. The method of claim 1, wherein step 1-2-1 comprises the steps of:
step 1-2-1-1, aiming at a word list of a text and a size N of a text window needing to be generatedwCalculating the number L of text windows to be divided as L/NwWhere L is the total length of the word list of the text;
step 1-2-1-2, calculating the starting position T of each text window according to the following formulabjAnd an end position Tej:
Tbj=Nw×j+1,
Tej=Nw×(j+1),
Wherein j is 1 … l;
and 1-2-1-3, sequentially generating the segmented text windows according to the starting position and the ending position of each text window in the single text list.
3. The method of claim 2, wherein step 2 comprises the steps of:
step 2-1, aiming at pairwise matching of all novel texts, sequentially selecting two time sequences s corresponding to emotion scores of novel emotion curves1…snAnd t1…tm,snRepresenting a time series s1…snN-th element of (1), tmRepresenting a time series t1…tmThe mth element, n and m are natural numbers, and the window size is set to be w1,
Step 2-2, presetting a matrix DTW with the size of nxm, wherein the direction of the matrix is from bottom to top and then from left to right, the value DTW [0,0] of the leftmost lower corner of the matrix is 0, and all other values are positive infinity;
step 2-3, sequentially inspecting matrix elements positioned in indexes a and b according to the sequence from bottom to top and from left to right; the first row and column of the matrix are not considered if the difference between a and b is greater than w1The value range of a is also not considered, the value range of b is 1-m;
taking the minimum value from the left, lower and lower left matrix elements adjacent to the matrix element, and adding the corresponding element s of the time sequencea,tbThe value of the matrix element currently under investigation is replaced by this new value;
step 2-4, returning a value DTW [ n, m ] of the uppermost right corner of the DTW matrix as a dynamic time regular distance between two target emotion score time sequences;
and 2-5, repeating the steps 2-1 to 2-4 until the dynamic time warping distances between every two texts are obtained, and arranging the dynamic time warping distances into a matrix of the dynamic time warping distances.
4. A method according to claim 3, characterized in that step 3 comprises the steps of:
step 3-1, logarithm is taken from actual download quantity data of the training text to obtain logarithm download quantity y of the training text;
step 3-2, generating a dynamic gauge for the training textCalculating the minimum eigenvalue lambda of the whole distance matrix Kmin;
Step 3-3, inputting noise levelUsing lambda obtained in step 3-2minAnd (3) correcting the same: if λmin>0, no change is made; if λmin<0, then noise levelIt is also necessary to add-lambdamin;
5. The method of claim 4, wherein steps 3-4 comprise the steps of:
step 3-4-1, inputting a dynamic regular distance matrix K, a logarithm download quantity y of a training data target value novel text and a corrected noise levelAnd a dynamic warping distance matrix k from the target to the training data*;
Step 3-4-2, calculating a matrixCholesky decomposition matrix L of1Wherein I represents an identity matrix;
step 3-4-3, calculating kernel function k*Coefficient matrix α:
the operation symbol A \ B represents solving the linear equation AX ═ X in B;
step 3-4-4, calculating the target logarithmic download quantity f*:
f*Namely the predicted value of the download amount.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710377512.3A CN107193969B (en) | 2017-05-25 | 2017-05-25 | Method for automatically generating novel text emotion curve and predicting recommendation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710377512.3A CN107193969B (en) | 2017-05-25 | 2017-05-25 | Method for automatically generating novel text emotion curve and predicting recommendation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107193969A CN107193969A (en) | 2017-09-22 |
CN107193969B true CN107193969B (en) | 2020-06-02 |
Family
ID=59875474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710377512.3A Active CN107193969B (en) | 2017-05-25 | 2017-05-25 | Method for automatically generating novel text emotion curve and predicting recommendation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107193969B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI661319B (en) * | 2017-11-30 | 2019-06-01 | 財團法人資訊工業策進會 | Apparatus, method, and computer program product thereof for generatiing control instructions based on text |
CN110413982B (en) * | 2018-04-27 | 2022-09-27 | 北京海马轻帆娱乐科技有限公司 | Text processing method and device |
CN110427485B (en) * | 2018-04-27 | 2022-07-05 | 北京海马轻帆娱乐科技有限公司 | Literature and literature classification method and apparatus |
JP2019219830A (en) * | 2018-06-18 | 2019-12-26 | 株式会社コミチ | Emotion evaluation method |
CN111653319A (en) * | 2020-06-17 | 2020-09-11 | 四川大学 | Method for constructing biomedical heterogeneous information network by fusing multi-source data |
CN113553423B (en) * | 2021-07-05 | 2023-10-10 | 北京奇艺世纪科技有限公司 | Scenario information processing method and device, electronic equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7983910B2 (en) * | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
CN102495873B (en) * | 2011-11-30 | 2013-04-10 | 北京航空航天大学 | Video recommending method based on video affective characteristics and conversation models |
CN103324662B (en) * | 2013-04-18 | 2016-12-28 | 中国科学院计算技术研究所 | The method for visualizing of the dynamic viewpoint differentiation of Social Media event and equipment |
CN105243448A (en) * | 2015-10-13 | 2016-01-13 | 北京交通大学 | Method and device for predicting evolution trend of internet public opinion |
CN106127220A (en) * | 2016-06-01 | 2016-11-16 | 苏州大学 | A kind of time series classification method and device |
-
2017
- 2017-05-25 CN CN201710377512.3A patent/CN107193969B/en active Active
Non-Patent Citations (1)
Title |
---|
基于语料库的翻译文体学视角下译者的情感指纹研究——基于态度立场标记的自建语料库研究;司炳月等;《外语电化教学》;20140331(第156期);第55-59页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107193969A (en) | 2017-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107193969B (en) | Method for automatically generating novel text emotion curve and predicting recommendation | |
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
Yan | i, Poet: Automatic Poetry Composition through Recurrent Neural Networks with Iterative Polishing Schema. | |
Ling et al. | Latent predictor networks for code generation | |
CN107608956A (en) | A kind of reader's mood forecast of distribution algorithm based on CNN GRNN | |
Dawdy-Hesterberg et al. | Learnability and generalisation of Arabic broken plural nouns | |
CN107391501A (en) | A kind of neural machine translation method of word-based prediction | |
Biçici | Referential translation machines for quality estimation | |
CN111428490B (en) | Reference resolution weak supervised learning method using language model | |
CN105205124A (en) | Semi-supervised text sentiment classification method based on random feature subspace | |
CN103020167B (en) | A kind of computer Chinese file classification method | |
CN104504023A (en) | High-accuracy computer automatic marking method for subjective items based on domain ontology | |
CN107957993A (en) | The computational methods and device of english sentence similarity | |
CN110085215A (en) | A kind of language model data Enhancement Method based on generation confrontation network | |
CN114238577B (en) | Multi-task learning emotion classification method integrating multi-head attention mechanism | |
CN111241271B (en) | Text emotion classification method and device and electronic equipment | |
CN104462408A (en) | Topic modeling based multi-granularity sentiment analysis method | |
Stallard et al. | Unsupervised morphology rivals supervised morphology for arabic mt | |
CN114970529A (en) | Weakly supervised and interpretable training of machine learning based Named Entity Recognition (NER) mechanisms | |
CN113553831A (en) | Method and system for analyzing aspect level emotion based on BAGCNN model | |
CN110516175A (en) | A kind of method, apparatus, equipment and the medium of determining user tag | |
Hao et al. | SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis | |
Salesky et al. | Exploiting morphological, grammatical, and semantic correlates for improved text difficulty assessment | |
CN109670171B (en) | Word vector representation learning method based on word pair asymmetric co-occurrence | |
Wankerl et al. | An Analysis of Perplexity to Reveal the Effects of Alzheimer's Disease on Language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |