CN117493570A

CN117493570A - News emotion prediction method and device, electronic equipment and storage medium

Info

Publication number: CN117493570A
Application number: CN202311500108.2A
Authority: CN
Inventors: 刘经友; 余家鑫; 蓝飘; 张承业; 田丰
Original assignee: GRG Banking IT Co Ltd
Current assignee: GRG Banking IT Co Ltd
Priority date: 2023-11-10
Filing date: 2023-11-10
Publication date: 2024-02-02

Abstract

The application discloses a news emotion prediction method, a news emotion prediction device, electronic equipment and a storage medium, and belongs to the field of artificial intelligence. The method comprises the following steps: acquiring news text to be processed; determining word structure weights and word frequency characteristics of feature words in the news text based on the news text; generating a text vector of the news text based on the word structure weight and the word frequency characteristic of the feature word; based on the text vector, the emotion type of the news text is determined. According to the news emotion prediction method, the structural factors of the news text are used as variables to construct word structural weights, so that special structural features of the news text can be attached and highlighted, interception is not needed to be carried out, partial content features are not needed to be lost, the quality of an external dictionary is not needed to be relied on, emotion prediction of news with long space can be supported, and prediction accuracy of news emotion is effectively improved.

Description

News emotion prediction method and device, electronic equipment and storage medium

Technical Field

The application belongs to the field of artificial intelligence, and particularly relates to a news emotion prediction method, a news emotion prediction device, electronic equipment and a storage medium.

Background

With the advent of the internet and social media, people can easily acquire a large amount of news information. However, due to the vast amount of news information, it becomes particularly important how to quickly and accurately analyze and filter such information. In general, news emotion analysis can help to improve news service quality, improve information organization and retrieval efficiency, assist decision making and investment, reflect public opinion and attitude, and have wide application value.

At present, prediction of news emotion is achieved through technical logic construction weight characteristics, and a prediction result is not accurate.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the news emotion prediction method, the device, the electronic equipment and the storage medium are provided, the word structure weight is built by taking the structural factors of the news text as variables, the special structural characteristics of the news text can be attached and highlighted, and the prediction accuracy of the news emotion is effectively improved.

In a first aspect, the present application provides a news emotion prediction method, including:

acquiring news text to be processed;

determining word structure weights and word frequency characteristics of feature words in the news text based on the news text;

Generating a text vector of the news text based on the word structure weight of the feature word and the word frequency feature;

and determining the emotion type of the news text based on the text vector.

According to the news emotion prediction method, the word structure weight and the word frequency characteristic of each characteristic word in the news text are calculated through the structural characteristics of the news, the text vector is constructed, emotion prediction of the news based on the text vector is achieved, the word structure weight is constructed through the structural factors of the news text as variables, the special structural characteristics of the news text can be attached and highlighted more, interception is not needed to be carried out, partial content characteristics are lost, the quality of an external dictionary is not needed to be relied on, emotion prediction of long-term news can be supported, and the prediction precision of news emotion is effectively improved.

According to one embodiment of the present application, the word structure weights are determined by:

determining at least one of a paragraph position coefficient, a paragraph length coefficient, a sentence position coefficient and a sentence type coefficient corresponding to the feature word based on the news text;

and determining the word structure weight of the feature word based on at least one of the paragraph position coefficient, the paragraph length coefficient, the sentence position coefficient and the sentence type coefficient corresponding to the feature word.

According to one embodiment of the present application, the determining the word structure weight of the feature word based on at least one of the paragraph position coefficient, the paragraph length coefficient, the sentence position coefficient, and the sentence type coefficient corresponding to the feature word includes:

determining a paragraph weight corresponding to each paragraph of the news text by the feature word based on at least one of the paragraph position coefficient, the paragraph length coefficient, the sentence position coefficient and the sentence type coefficient of the feature word;

generating the word structure weight of the feature word based on the number of feature words and the sum of all the paragraph weights of the feature words.

According to one embodiment of the present application, after the obtaining the news text to be processed, before the determining, based on the news text, the word structure weight and the word order feature of the feature word in the news text, the method further includes:

word segmentation processing is carried out on the news text;

and performing stop word filtering on the news text after word segmentation, and determining characteristic words in the news text after stop word filtering, wherein the news text after stop word filtering has the same paragraph structure as the news text to be processed.

According to one embodiment of the present application, the determining, based on the text vector, an emotion type of the news text includes:

inputting the text vector into a vector machine of a news emotion prediction model;

and classifying emotion tendencies of the text vectors based on the vector machine, and determining the emotion type of the news text, wherein the news emotion prediction model is trained based on a news sample set and emotion type labels.

According to one embodiment of the present application, the news emotion prediction model further includes a first layer and a second layer, and determining, based on the news text, a word structure weight and a word frequency feature of a feature word in the news text includes:

inputting the news text into a news emotion prediction model;

based on the first layer, feature extraction and weight calculation of feature words are carried out on the news text, and the word structure weight and the word frequency feature of the feature words in the news text are determined;

generating a text vector of the news text based on the word structure weight of the feature word and the word frequency feature, including:

and summing the product of the word structure weight and the word frequency characteristic of each characteristic word based on the second layer to obtain the text vector.

In a second aspect, the present application provides a news emotion prediction apparatus, the apparatus comprising:

the acquisition module is used for acquiring news texts to be processed;

the first processing module is used for determining word structure weights and word frequency characteristics of feature words in the news text based on the news text;

the second processing module is used for generating a text vector of the news text based on the word structure weight of the feature word and the word frequency feature;

and the third processing module is used for determining the emotion type of the news text based on the text vector.

According to the news emotion prediction device, the word structure weight and the word frequency characteristic of each characteristic word in the news text are calculated through the structural characteristics of the news, the text vector is constructed, emotion prediction of the news based on the text vector is realized, the word structure weight is constructed through the structural factors of the news text as variables, the special structural characteristics of the news text can be attached and highlighted more, interception is not needed to be carried out, partial content characteristics are lost, the quality of an external dictionary is not needed to be relied on, emotion prediction of long-term news can be supported, and the prediction precision of news emotion is effectively improved.

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the news emotion prediction method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a news emotion prediction method as described in the first aspect above.

In a fifth aspect, the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the news emotion prediction method according to the first aspect.

In a sixth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a news emotion prediction method as described in the first aspect above.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a schematic flow chart of a news emotion prediction method according to an embodiment of the present application;

FIG. 2 is a second flowchart of a news emotion prediction method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a news emotion prediction device provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

News emotion prediction in the related art is mainly divided into two types, one type utilizes an emotion dictionary and a support vector machine to carry out text emotion analysis, but the technology needs to manually collect emotion words in advance to construct weights of the support vector machine, and the prediction method is seriously dependent on the quality of the dictionary; another class uses pre-trained language characterization models (Bidirectional Encoder Representation from Transformers, BERT) to implement emotion analysis, but the length of BERT training text is limited, and too long text requires too much machine resource training and reasoning, so it is common to intercept only the head-to-tail fixed length of content, typically 512 token, and feature inaccuracy is caused by content loss. Both methods construct weight characteristics through technical logic, neglect the characteristics of news, and lead to inaccurate emotion classification of news.

Aiming at the problems in the related art, the news emotion prediction method provided by the application can calculate the word structure weight and word frequency characteristics of each characteristic word in the news text according to the structural characteristics of the news, and construct text vectors to realize emotion prediction of news based on the text vectors.

According to the news emotion prediction method, the structural factors of the news text are used as variables to construct word structural weights, so that the special structural characteristics of the news text can be more attached and highlighted, interception is not needed to be carried out to lose part of content characteristics, the quality of an external dictionary is not needed to be relied on, emotion prediction of long-spread news can be supported, and the prediction accuracy of news emotion is effectively improved.

The news emotion prediction method, the news emotion prediction device, the electronic device and the readable storage medium provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

The news emotion prediction method can be applied to a terminal, and can be specifically executed by hardware or software in the terminal.

The terminal includes, but is not limited to, a portable communication device such as a mobile phone or tablet having a touch sensitive surface (e.g., a touch screen display and/or a touch pad). It should also be appreciated that in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

In the following various embodiments, a terminal including a display and a touch sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and joystick.

The execution subject of the news emotion prediction method provided by the embodiment of the present application may be an electronic device or a functional module or a functional entity capable of implementing the news emotion prediction method in the electronic device, where the electronic device mentioned in the embodiment of the present application includes, but is not limited to, a mobile phone, a tablet computer, a camera, a wearable device, and the like, and the news emotion prediction method provided by the embodiment of the present application is described below by taking the electronic device as an execution subject.

As shown in fig. 1, the news emotion prediction method includes: steps 110 to 140.

Step 110, obtaining news text to be processed.

It will be appreciated that the news text to be processed may include a title and a body, the body may have one or more paragraphs, each paragraph having one or more sentences therein, and that punctuation marks at the end of each sentence may be a period, question mark, ellipsis or exclamation mark.

Step 120, determining word structure weights and word frequency characteristics of feature words in the news text based on the news text.

Wherein, the feature words can include: valid words, professional words, named entities, numbers and time words, adjectives and adverbs, verb and verb phrases, interjections and mood words, and the like.

The effective words have actual meaning and information quantity in the text processing process, can better express the subject and the content of the text, and have important roles in tasks such as text classification, emotion analysis, information retrieval and the like.

It should be noted that the word structure weights may be determined based on the structure and location of the feature words in the news text, and the word frequency features may be characterized using the inverse document frequency (Inverse Document Frequency, IDF) of the feature words.

In the actual execution process, a plurality of feature words are determined in the news text, the word structure weight of each feature word is obtained according to the position of each feature word in the structure of the news text, and the word frequency feature of each feature word is obtained according to the frequency of each feature word.

And 130, generating a text vector of the news text based on the word structure weight and the word frequency characteristic of the characteristic word.

In actual execution, for a news text, counting positions and times of occurrence of each feature word in the news text, and carrying out corresponding weighting processing according to word structure weights and word frequency characteristics of each feature word to obtain feature representation of each feature word.

These feature representations may be represented as a vector, the length of which is the length of the vocabulary.

And integrating the characteristic representation of each characteristic word into a vector, wherein the vector is the text vector of the whole news text.

And 140, determining the emotion type of the news text based on the text vector.

Among them, emotion types may include: positive, negative and neutral emotional tendencies can also be included, and classification of emotion types can be flexibly divided according to actual requirements.

In actual execution, after obtaining the text vector, the emotion type of the news text can be predicted by using a prediction model obtained based on machine algorithms such as naive Bayes, support vector machines, random forests and the like. This process may be implemented using a tool library such as sklearn machine learning toolkit.

According to the news emotion prediction method provided by the embodiment of the application, the word structure weight and the word frequency characteristic of each characteristic word in the news text are calculated through the structural characteristics of the news, the text vector is constructed, emotion prediction of the news based on the text vector is realized, the word structure weight is constructed by taking the structural factors of the news text as variables, the special structural characteristics of the news text can be attached and highlighted more, the interception is not needed to lose part of content characteristics, the quality of an external dictionary is not needed to be relied on, emotion prediction of news with long space can be supported, and the prediction precision of news emotion is effectively improved.

In some embodiments, after obtaining the news text to be processed, before determining the word structure weights and word frequency features of the feature words in the news text based on the news text, the method further comprises:

word segmentation processing is carried out on the news text;

and performing stop word filtering on the segmented news text, determining characteristic words in the news text subjected to stop word filtering, wherein the news text subjected to stop word filtering has the same paragraph structure as the news text to be processed.

In actual execution, a word segmentation tool library such as a barker word segmentation tool or a natural language processing tool kit (Natural Language Toolkit, NLTK) is used for segmenting a news text to be processed, stop Words (Stop Words) in the news text are filtered, the arrangement structure of original paragraphs and punctuation marks of the news text is reserved, and the word structure weight of each feature word is conveniently calculated.

The stop words are usually some functional words or connective words, which frequently appear in news text, but do not contribute much to the subject matter and content of the news text, and are usually used to improve the quality of text features or reduce the dimensionality of the text features.

After the stop words are filtered out by the news text, the rest words are characteristic words.

In the embodiment, the dead words in the news text are filtered, so that noise and redundant information of the news text can be reduced, and the effect and efficiency of subsequent processing are improved.

In some embodiments, the news emotion prediction model further includes a first layer and a second layer, and determining word structure weights and word order features of feature words in the news text based on the news text includes:

inputting news texts into a news emotion prediction model;

based on the first layer, feature extraction and weight calculation of feature words are carried out on the news text, and word structure weights and word frequency features of the feature words in the news text are determined;

generating a text vector of the news text based on the word structure weight and the word frequency feature of the feature word, comprising:

and summing the product of the word structure weight and the word frequency characteristic of each characteristic word based on the second layer to obtain a text vector.

In actual execution, after the news text is input into the news emotion prediction model, the first layer of the news emotion prediction model carries out word segmentation and word filtering stopping on the news text, and feature words are extracted, on one hand, word frequency characteristics of each feature word in the news text are determined in a dictionary object array structure, and on the other hand, at least one of paragraph position coefficients, paragraph length coefficients, sentence position coefficients and sentence type coefficients is calculated for each feature word, so that word structure weights of each feature word in the news text are obtained.

And after receiving the word structure weights and word frequency characteristics output by the first layer, the second layer of the news emotion prediction model calculates paragraph weights of each characteristic word in each paragraph of the news text and sums the paragraph weights to obtain text vectors of the news text.

In the embodiment, the news text is rapidly processed by using the news emotion prediction model based on artificial intelligence, so that text vectors convenient to classify are obtained, and a basis is provided for predicting emotion types of the news text.

In some embodiments, the word structure weights are determined by:

the word structure weight of the feature word is determined based on at least one of a paragraph position coefficient, a paragraph length coefficient, a sentence position coefficient, and a sentence type coefficient to which the feature word corresponds.

In actual execution, the first layer calculates a paragraph position coefficient Wpp, a paragraph length coefficient Wpl, a sentence position coefficient Wsp, and a sentence type coefficient Wst for each feature word in the news text.

(1) The paragraph position coefficient Wpp of each feature word in the news text is counted, the paragraph position coefficients Wpp of the first paragraph and the tail paragraph are set to be 1, the headline is important in the news, the paragraph position coefficient Wpp is 1.5, and then the paragraph position coefficient Wpp of the middle paragraph is decreased according to the percentage from the first paragraph to the far, and the main purpose is to better disperse the difference of the paragraph positions.

The paragraph position coefficient Wpp of each feature word is calculated as follows:

Wpp＝1.5*isTitle+1*(1-abs(paragraphPosition))*(1-distancePercentage)

if the feature word is not in the Title of the news text, the Title value is 0; the paralgram position represents the paragraph position of the paragraph in which the feature word is positioned, if the paragraph position of the paragraph in which the feature word is positioned is the first paragraph, the paralgram position takes a value of 1, if the paragraph position of the paragraph in which the feature word is positioned is the middle paragraph, the paralgram position takes a value of 0, and if the paragraph position of the paragraph in which the feature word is positioned is the tail paragraph, the paralgram position takes a value of-1; the distance percentage represents the percentage of the paragraph where the feature word is located from the first segment or the last segment, the value ranges from 0 to 1, and the smaller the distance percentage is, the closer the distance percentage is to the middle segment.

(2) For paragraph lengths, the average length is set to 1, and the weight of the paragraph length is calculated according to the average length percentage.

The paragraph length coefficient of each feature word is Wpl, and the calculation formula is as follows:

Wpl＝paragraphLength/averageLength

where paralograngth represents the paragraph length where the feature word is located, averageLength represents the average paragraph length of the news text, lj represents the paragraph length of paragraph with the sequence number j (j=1, 2,3, …, n), and there are n paragraphs in total throughout the news text.

(3) And according to the position of the sentence where the feature word is located, giving a sentence position coefficient Wsp of the feature word, wherein the sentence where the feature word is located belongs to the beginning and the end, the Wsp is 1.5, and the rest Wsp is 1.

The sentence position coefficient of each feature word is Wsp, and the calculation formula is as follows:

Wsp＝1.5*isStart+1.5*isEnd+1*(1-isStart)*(1-isEnd)

wherein, isStart represents whether the sentence where the feature word is located is at the beginning, if so, the value of isStart is 1, and if not, the value of isStart is 0; isEnd indicates whether the sentence in which the feature word is located is at the end, if so, the isEnd takes a value of 1, and if not, the isEnd takes a value of 0.

(4) According to the sentence type of the sentence in which the feature word is located, the sentence type coefficient given to the feature word is Wst, if the sentence type is an question sentence and an exclamation sentence, wst is 1.5, and if the sentence type is other, wst is 1.

The sentence type coefficient of each feature word is Wst, and the calculation formula is as follows:

Wst＝1.5*sentenceType+1*(1-sentenceType)

the sendencetype indicates a sentence type of a sentence in which the feature word is located, if the sentence type of the sentence in which the feature word is located is a question sentence or an exclamation sentence, the sendencetype value is 1, and if the sentence type of the sentence in which the feature word is located is other types, the sendencetype value is 0.

In this embodiment, according to the structural features of multiple layers of the news text, corresponding paragraph position coefficients, paragraph length coefficients, sentence position coefficients and sentence type coefficients are given to feature words at different positions, so as to provide a basis for obtaining word structure weights more in line with the news text.

In some embodiments, determining the word structure weight of the feature word based on at least one of a paragraph position coefficient, a paragraph length coefficient, a sentence position coefficient, and a sentence type coefficient to which the feature word corresponds includes:

determining paragraph weights corresponding to the feature words in each paragraph of the news text based on at least one of the paragraph position coefficients, the paragraph length coefficients, the sentence position coefficients and the sentence type coefficients of the feature words;

word structure weights for the feature words are generated based on the number of feature words and the sum of all paragraph weights for the feature words.

In an actual implementation, there are m feature words i (i=1, 2,3, …, m) in paragraph j, and the feature word i appears in paragraph j as a paragraph weight TPF (i, j).

If the word structure weight of the feature word i is obtained based on the paragraph position coefficient, the paragraph length coefficient, the sentence position coefficient and the sentence type coefficient, the calculation formula of the paragraph weight TPF (i, j) of the feature word i in the paragraph j is as follows:

The number of the feature words i in the whole news text is Nd, j (j=1, 2,3, …, n) represents a paragraph, the word structure weight of the feature words i in the whole news text is denoted as TDFi, and the calculation formula is as follows:

so far, the first layer calculation is finished, and the word structure weight TDFi of each feature word i is sent to the second layer.

The calculation formula of the second layer is as follows:

k feature words exist in the whole news text, the text vector D of the news text can be expressed by all feature words i (i=1, 2,3, …, k), and the calculation formula of the text vector D is as follows:

where IDFi is the inverse document frequency of the feature word i.

So far, the second layer calculation ends.

In this embodiment, the quality and accuracy of the text vector can be improved by obtaining the paragraph weights of each feature word in each paragraph of the news text and summing them, and further obtaining the text vector of the whole news text.

In some embodiments, determining the emotion type of the news text based on the text vector includes:

and classifying emotion tendencies of the text vectors based on a vector machine, and determining emotion types of the news text, wherein the news emotion prediction model is trained based on a news sample set and emotion type labels.

The news emotion prediction model further comprises the following steps: and (5) a vector machine.

When the emotion types comprise positive and negative types, the news emotion prediction model uses a support vector machine (Support Vector Machines, SVM) as a two-classifier to map an input text vector D into a high-dimensional space, and searches a hyperplane in the high-dimensional space, so that the hyperplane can divide the input text vector into two different positive and negative types, the prediction of the emotion types of the news text is realized, and after the classification of the vector machine is finished, the classification result is output as the emotion types of the news text.

The news sample set comprises a plurality of text samples, each text sample corresponds to an emotion type label, the text samples with emotion type labels are sequentially input into the news emotion prediction model, and training of the news emotion prediction model can be achieved.

In the embodiment, the emotion type is predicted by using a vector machine, so that the method has high classification accuracy and good generalization capability.

A specific embodiment is described below.

As shown in fig. 2, a news sample set is constructed by crawling a plurality of published news texts as text samples by a crawler.

And labeling each text sample by using emotion type labels, and equally dividing the text sample into two emotion types with consistent numbers, namely positive emotion types and negative emotion types.

And (3) using a crust word segmentation tool to segment each text sample, filtering out stop words, reserving the arrangement structure of the original paragraph and punctuation marks, and conveniently calculating the word structure weight of each feature word.

And constructing a dictionary object array structure for all feature words in all text samples in the news sample set, wherein the length of the dictionary object array is equal to the total number of different feature words in all text samples, each index value represents a different feature word and is used for expressing text vectors of the whole text sample, namely, all words appearing in all text samples are contained in a dictionary, and the construction process can be realized through one-hot coding or one-hot coding.

For example, if the news sample set contains 3 text samples, for example:

text sample 1: weather today is really good.

Text sample 2: is there rain in the open?

Text sample 3: weather forecast indicates rainy days.

All words contained in the three text samples are fetched and de-duplicated to obtain a vocabulary, for example: today, weather, true, tomorrow, meeting, rainy, weather forecast, speaking.

Next, a vector representation is constructed for each text sample in word order in the vocabulary. For example, for text sample 1, the following vector may be obtained: [1,1,1,0,0,0,0,0,0];

wherein the first element of the vector represents the first word in the vocabulary, i.e. "today", the value of the corresponding position is 1 if this word appears in the text sample, otherwise 0. Similarly, the second element of the vector represents the second word in the vocabulary, i.e., "weather," and so on.

In this way, each text sample can be represented as a sparse vector, where the value at each location represents the number of occurrences of the corresponding feature word in the text sample, facilitating text classification, clustering, retrieval, and other tasks.

The dictionary object array structure of the feature words refers to that each feature word in the dictionary is packaged into an object and stored in an array mode. The object of each feature word typically contains the following information: word itself (e.g., word), word part of the word (e.g., noun, verb, etc.), word paraphrasing or definition (e.g., a word or group of words expressing a meaningful concept), and other related attributes, such as sense words, antisense words, example sentences, etc.

By using the dictionary object array structure, the characteristic words in the dictionary can be conveniently traversed, searched and operated.

In the dictionary object array structure, the index of the array represents a feature word for expressing the text vector of the whole text sample.

For a news sample set containing multiple text samples, a Bag of Words Model (Bag-of-Words Model) may be used to represent each text sample as a vector of Words for subsequent text analysis and processing.

The inverse document frequency IDF is used for measuring the importance or distinguishing degree index of a feature word for a news sample set, is obtained by calculating the inverse frequency of a text sample containing the feature word in the whole news sample set, is used for distinguishing common words and keywords, and has important significance for information retrieval and text analysis tasks.

For example, if a feature word appears in most text samples, the inverse document frequency of the feature word is low; conversely, if a feature word occurs in a small number of text samples, the inverse document frequency of the feature word is high.

The calculation formula of the inverse document frequency IDFi of the feature word i is as follows:

IDFi＝log(Ntdi/Ncd)

Where Ntd is the number of text samples in the news sample set and Ncdi is the number of text samples containing the feature word i.

Ntd is a positive integer describing the size of the news sample set, and Ncdi is a non-negative integer measuring the frequency of occurrence of the feature word i in the news sample set.

The inverse document frequency IDFi of each feature word is stored in a dictionary object array structure.

As the feature word appears in more text samples, ntd increases, the IDF value decreases, indicating that the feature word is of lower importance to the news sample set, while as the feature word appears in fewer text samples, ncdi decreases, and the IDF value increases, indicating that the feature word is of higher importance to the news sample set.

In addition, the IDF value can be smoothed, so that the problem that the IDF value is overlarge when the DF value is smaller is avoided. For example, a smoothing method, such as adding 1 or taking a logarithm, etc., may be used for adjustment.

After the inverse document frequency IDF of all feature words is counted, the counted inverse document frequency IDF is stored in the memory dictionary object number.

Next, a first layer and a second layer of a news emotion prediction model are constructed, and a text vector of each text sample is obtained.

The position coefficient of each paragraph of the feature word in the text sample is Wpp, and the position and the number of times of the feature word i in each text sample in the paragraph are statistically recorded. The paragraph position coefficients Wpp of the first paragraph and the last paragraph are set to be 1, the headline is important in news, the paragraph position coefficient Wpp is 1.5, and then the paragraph position coefficient Wpp of the middle paragraph is decreased according to the percentage from the first paragraph or the last paragraph to the far, mainly for better dispersion of the differentiation of the paragraph positions.

Wpp＝1.5*isTitle+1*(1-abs(paragraphPosition))*(1-distancePercentage)

if the feature word is not in the Title of the text sample, the Title value is 0; the paralgram position represents the paragraph position of the paragraph in which the feature word is positioned, if the paragraph position of the paragraph in which the feature word is positioned is the first paragraph, the paralgram position takes a value of 1, if the paragraph position of the paragraph in which the feature word is positioned is the middle paragraph, the paralgram position takes a value of 0, and if the paragraph position of the paragraph in which the feature word is positioned is the tail paragraph, the paralgram position takes a value of-1; the distance percentage represents the percentage of the paragraph where the feature word is located from the first segment or the last segment, the value ranges from 0 to 1, and the smaller the distance percentage is, the closer the distance percentage is to the middle segment.

For paragraph lengths, the average length is set to 1, and the weight of the paragraph length is calculated according to the average length percentage.

Wpl＝paragraphLength/averageLength

where paralograngth represents the paragraph length where the feature word is located, averageLength represents the average paragraph length of the text sample, lj represents the length of the paragraph with the sequence number j (j=1, 2,3, …, n), and there are n paragraphs in the entire text sample.

And according to the position of the sentence where the feature word is located, giving a sentence position coefficient Wsp of the feature word, wherein the sentence where the feature word is located belongs to the beginning and the end, the Wsp is 1.5, and the rest Wsp is 1.

Wsp＝1.5*isStart+1.5*isEnd+1*(1-isStart)*(1-isEnd)

According to the sentence type of the sentence in which the feature word is located, the sentence type coefficient given to the feature word is Wst, if the sentence type is an question sentence and an exclamation sentence, wst is 1.5, and if the sentence type is other, wst is 1.

Wst＝1.5*sentenceType+1*(1-sentenceType)

There are m feature words i in paragraph j (i=1, 2,3, …, m), and the feature word i appears in paragraph j as a paragraph weight TPF (i, j).

the number of the feature words i in the whole text sample is Nd, j (j=1, 2,3, …, n) represents a paragraph, the word structure weight of the feature words i in the whole text sample is denoted as TDFi, and the calculation formula is as follows:

k feature words exist in the whole text sample, the text vector D of the text sample can be expressed by all feature words i (i=1, 2,3, …, k), and the calculation formula of the text vector D is as follows:

where IDFi is the inverse document frequency of the feature word i.

After the text vector for each text sample is obtained, the text vector carrying emotion type tags is divided into a training set and a verification set.

And inputting a training set and a verification set into the SVM class library by using an SVM class library of an open-source large-scale linear classification and regression problem (LIBLINEAR), selecting a linear SVM model using L1 regularization and L2 loss functions, and finally constructing an SVM problem, and solving a vector machine of the news emotion prediction model.

And evaluating and optimizing the first layer, the second layer and the vector machine of the obtained news emotion prediction model by using methods such as cross verification and the like so as to improve the performance and generalization capability of the news emotion prediction model.

In the embodiment, a basis is provided for accurate prediction of emotion types of news texts by constructing a news emotion prediction model.

According to the news emotion prediction method provided by the embodiment of the application, the execution subject can be a news emotion prediction device. In the embodiment of the present application, a news emotion prediction method performed by a news emotion prediction device is taken as an example, and the news emotion prediction device provided in the embodiment of the present application is described.

The embodiment of the application also provides a news emotion prediction device.

As shown in fig. 3, the news emotion prediction apparatus includes: an acquisition module 310, a first processing module 320, a second processing module 330, and a third processing module 340.

An acquisition module 310, configured to acquire news text to be processed;

a first processing module 320, configured to determine, based on the news text, a word structure weight and a word order feature of feature words in the news text;

a second processing module 330, configured to generate a text vector of the news text based on the word structure weight and the word order feature of the feature word;

a third processing module 340 is configured to determine an emotion type of the news text based on the text vector.

According to the news emotion prediction device provided by the embodiment of the application, the word structure weight and the word frequency characteristic of each characteristic word in the news text are calculated through the structural characteristics of the news, the text vector is constructed, emotion prediction of the news based on the text vector is realized, the word structure weight is constructed through the structural factors of the news text as variables, the special structural characteristics of the news text can be attached and highlighted more, interception is not needed to be carried out to lose part of content characteristics, the quality of an external dictionary is not needed to be relied on, emotion prediction of news with long space can be supported, and the prediction precision of news emotion is effectively improved.

In some embodiments, the word structure weights are determined by:

summing at least one of a paragraph position coefficient, a paragraph length coefficient, a sentence position coefficient and a sentence type coefficient of each feature word in each paragraph of the news text to generate a paragraph weight corresponding to each feature word and each paragraph;

based on the number of each feature word in the news text, summing all paragraph weights of each feature word to generate a word structure weight of the feature word.

In some embodiments, after obtaining the news text to be processed, before determining the word structure weights and the word frequency features of the feature words in the news text based on the news text, the method further includes:

Word segmentation is carried out on the news text;

In some embodiments, determining word structure weights and word order features for feature words in a news text based on the news text includes:

inputting news texts into a news emotion prediction model, wherein the news emotion prediction model comprises a first layer and a second layer;

and summing the product of the word structure weight and the word frequency order characteristic of each characteristic word based on the second layer to obtain the text vector word structure weight.

In some embodiments, the news emotion prediction model further includes a vector machine that determines emotion types of the news text based on the text vectors, including:

The news emotion prediction device in the embodiment of the application may be an electronic device, or may be a component in the electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The news emotion prediction device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The news emotion prediction device provided in the embodiment of the present application can implement each process implemented in the embodiments of the news emotion prediction methods in fig. 1 and fig. 2, and in order to avoid repetition, a detailed description is omitted here.

In some embodiments, as shown in fig. 4, the embodiment of the present application further provides an electronic device 400, including a processor 401, a memory 402, and a computer program stored in the memory 402 and capable of running on the processor 401, where the program when executed by the processor 401 implements the respective processes of the above-mentioned news emotion prediction method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

The embodiment of the present application further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the embodiment of the news emotion prediction method, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

The processor is a processor in the electronic device in the above embodiment. Readable storage media include computer readable storage media such as computer readable memory ROM, random access memory RAM, magnetic or optical disks, and the like.

The embodiment of the application also provides a computer program product, which comprises a computer program, and the computer program realizes the news emotion prediction method when being executed by a processor.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or instructions, each process of the embodiment of the news emotion prediction method can be realized, the same technical effect can be achieved, and in order to avoid repetition, the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

Claims

1. A news emotion prediction method, comprising:

acquiring news text to be processed;

and determining the emotion type of the news text based on the text vector.

2. The news emotion prediction method of claim 1, wherein the word structure weight is determined by:

3. The news emotion prediction method of claim 2, wherein the determining the word structure weight of the feature word based on at least one of the paragraph position coefficient, the paragraph length coefficient, the sentence position coefficient, and the sentence type coefficient corresponding to the feature word comprises:

4. A news emotion prediction method as claimed in any one of claims 1 to 3, characterized in that after said obtaining a news text to be processed, before said determining, based on said news text, word structure weights and word order features of feature words in said news text, said method further comprises:

word segmentation processing is carried out on the news text;

5. A news emotion prediction method as claimed in any one of claims 1 to 3, characterized in that said determining the emotion type of the news text based on the text vector comprises:

6. The news emotion prediction method of claim 5, wherein the news emotion prediction model further comprises a first layer and a second layer, and wherein determining word structure weights and word order features of feature words in the news text based on the news text comprises:

inputting the news text into a news emotion prediction model;

7. A news emotion prediction apparatus, comprising:

the acquisition module is used for acquiring news texts to be processed;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the news emotion prediction method of any one of claims 1-6 when the program is executed by the processor.

9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the news emotion prediction method of any of claims 1-6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the news emotion prediction method of any one of claims 1-6.