CN107688651B - News emotion direction judgment method, electronic device and computer readable storage medium - Google Patents

News emotion direction judgment method, electronic device and computer readable storage medium Download PDF

Info

Publication number
CN107688651B
CN107688651B CN201710775417.9A CN201710775417A CN107688651B CN 107688651 B CN107688651 B CN 107688651B CN 201710775417 A CN201710775417 A CN 201710775417A CN 107688651 B CN107688651 B CN 107688651B
Authority
CN
China
Prior art keywords
news
event
predicted
emotion
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710775417.9A
Other languages
Chinese (zh)
Other versions
CN107688651A (en
Inventor
陈一恋
汪超慧
王智
汪伟
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201710775417.9A priority Critical patent/CN107688651B/en
Priority to PCT/CN2017/108811 priority patent/WO2019041528A1/en
Publication of CN107688651A publication Critical patent/CN107688651A/en
Application granted granted Critical
Publication of CN107688651B publication Critical patent/CN107688651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • G06F16/3326Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a news emotion direction judgment method, which comprises the following steps: performing semantic scoring on news to be predicted through a preset machine learning algorithm to obtain the emotion score of the news to be predicted; adjusting the emotion scores of the news to be predicted, which are obtained by the preset machine learning algorithm, according to a preset event label-event keyword rule; and determining the emotional direction of the news to be predicted according to the adjusted emotional score of the news to be predicted. The method and the device can improve the accuracy of judging the emotional direction of the news.

Description

News emotion direction judgment method, electronic device and computer readable storage medium
Technical Field
The invention relates to the technical field of computer information, in particular to a news emotion direction judging method, electronic equipment and a computer readable storage medium.
Background
While semantic parsing is performed on news, attention is often paid to whether the emotional direction of the news is positive or negative and how much the news is positive or negative. In the existing method, a machine learning method (such as an algorithm of a random forest) is generally adopted to calculate scores of news, and the positive and negative of the news are judged according to the obtained scores, so that the accuracy of the results is not high, and the poor customer experience is possibly caused. Therefore, the method for judging the emotional direction of the news in the prior art is not reasonable in design and needs to be improved urgently.
Disclosure of Invention
In view of this, the invention provides a news emotion direction judgment method, an electronic device and a computer readable storage medium, which adjust a news emotion score obtained by a machine learning algorithm through a preset event tag hit rule (including an event keyword or an event regular expression), so that the accuracy of news emotion direction judgment is effectively improved.
Firstly, in order to achieve the above object, the present invention provides a method for judging a news emotion direction, which is applied to an electronic device, and the method includes:
performing semantic scoring on news to be predicted through a preset machine learning algorithm to obtain the emotion score of the news to be predicted;
adjusting the emotion scores of the news to be predicted, which are obtained by the preset machine learning algorithm, according to a preset event label-event keyword rule; and
and determining the emotional direction of the news to be predicted according to the adjusted emotional score of the news to be predicted.
Preferably, the event tag-event keyword rule is set as a first file, and the first file comprises event tags for distinguishing event categories, event keywords, and emotion scores corresponding to each event keyword.
Preferably, the adjusting the sentiment score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, includes:
traversing the title and the text of the news to be predicted;
if the event key words in the first file are identified from the title and the text of the news to be predicted, the emotion scores corresponding to the identified event key words in the first file are used as the final scores of the news to be predicted, and the event labels corresponding to the identified event key words are used as main business events of the news to be predicted; and
and if the event key words in the first file are not identified from the title and the text of the news to be predicted, taking the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, as the final scores of the news to be predicted.
Preferably, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes:
if the event key words in the first file are identified from the title and the text of the news to be predicted, and the corresponding emotion scores of the identified event key words in the first file are not in the same grade with the emotion scores obtained by the preset machine learning algorithm, performing weighted calculation on the emotion scores obtained by the preset machine learning algorithm and the emotion scores corresponding to the identified event key words in the first file as main weights to obtain a weighted score as the final score of the news to be predicted.
Preferably, the weighting calculation includes:
multiplying the emotion score corresponding to the identified event keyword in the first file by a first preset proportion, and multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion; and
and adding the products of the two to obtain a weighted score as a final score of the news to be predicted, wherein the first preset proportion is greater than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
Preferably, the method further comprises:
and adjusting the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, according to a preset event label-event regular expression rule, wherein the event label-event regular expression rule is set as a second file, and the second file comprises event labels for distinguishing event types, event regular expressions and emotion scores corresponding to the event regular expressions.
Preferably, the adjusting the sentiment score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, includes:
if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, taking the emotion score of the event regular expression in the second file as the final score of the news to be predicted, and taking the event label corresponding to the event regular expression as the main business event of the news to be predicted; and
and if the event key words in the first file are not identified from the title and the text of the news to be predicted and the content conforming to the event regular expression in the second file is not identified, taking the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, as the final score of the news to be predicted.
Preferably, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes:
if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, and the emotion score of the event regular expression in the second file is not in the same grade with the emotion score obtained by the preset machine learning algorithm, performing weighted calculation on the emotion score which is corresponding to the event regular expression in the second file and the emotion score obtained by the preset machine learning algorithm by taking the emotion score as a main weight to obtain a weighted score which is used as the final score of the news to be predicted.
In addition, in order to achieve the above object, the present invention further provides an electronic device, which includes a memory and a processor, wherein the memory stores a news emotion direction determination system operable on the processor, and when the news emotion direction determination system is executed by the processor, the processor executes the steps of the above news emotion direction determination method.
Further, to achieve the above object, the present invention also provides a computer readable storage medium storing a news emotion direction determination system, which is executable by at least one processor, so that the at least one processor executes the steps of the news emotion direction determination method.
Compared with the prior art, the electronic equipment, the news emotion direction judging method and the computer readable storage medium provided by the invention adjust the news emotion scores obtained by a machine learning algorithm (such as a random forest algorithm) through the preset event label hit rule (including event keywords or event regular expressions).
Drawings
FIG. 1 is a diagram of an alternative hardware architecture for an electronic device of the present invention;
FIG. 2 is a block diagram of an embodiment of a system for determining a direction of a new emotion in an electronic device;
fig. 3 is a schematic flow chart of an embodiment of a news emotion direction determination method according to the present invention.
Reference numerals:
Figure BDA0001395818320000041
Figure BDA0001395818320000051
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
It is further noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
First, the present invention provides an electronic device 2.
Fig. 1 is a schematic diagram of an alternative hardware architecture of the electronic device 2 according to the present invention. In this embodiment, the electronic device 2 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23, which may be communicatively connected to each other through a system bus. It is noted that fig. 1 only shows the electronic device 2 with components 21-23, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The electronic device 2 may be a rack server, a blade server, a tower server, or a rack server, and the electronic device 2 may be an independent server or a server cluster formed by a plurality of servers.
The memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 21 may be an internal storage unit of the electronic device 2, such as a hard disk or a memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the electronic device 2. Of course, the memory 21 may also comprise both an internal memory unit and an external memory device of the electronic device 2. In this embodiment, the memory 21 is generally used for storing an operating system installed in the electronic device 2 and various application software, such as program codes of the news emotion direction determination system 20. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is generally configured to control the overall operation of the electronic device 2, such as performing control and processing related to data interaction or communication with the electronic device 2. In this embodiment, the processor 22 is configured to run the program codes stored in the memory 21 or process data, for example, run the news emotion direction determination system 20.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing a communication connection between the electronic device 2 and other electronic devices. For example, the network interface 23 is used to connect the electronic device 2 with an external data platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and the external data platform. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
The application environment and the hardware structure and function of the related devices of the various embodiments of the present invention have been described in detail so far. Hereinafter, various embodiments of the present invention will be proposed based on the above-described application environment and related devices.
Fig. 2 is a block diagram of a program of an embodiment of a system 20 for determining a direction of a new emotion in an electronic device 2 according to the present invention. In this embodiment, the system 20 for determining the emotional feeling direction of news can be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention. For example, in fig. 2, the news emotion direction determination system 20 can be divided into a scoring module 201, an adjusting module 202, and a determining module 203. The program modules referred to in the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable than programs for describing the execution process of the news emotion direction determination system 20 in the electronic device 2. The functions of the program modules 201 and 203 will be described in detail below.
The scoring module 201 is configured to perform semantic scoring on the news to be predicted through a predetermined machine learning algorithm, and acquire an emotion score of the news to be predicted.
Preferably, in this embodiment, the predetermined machine learning algorithm may adopt a random forest algorithm (e.g. open source package weka), and the semantic scoring thereof includes the following steps:
(1) firstly, manually selecting a random forest model training set, wherein positive and negative news data contents are the titles of each news;
(2) acquiring a Chinese word vector library (vector library corpus can be open-source wiki news content) required by a training set (training model), performing HanLP segmentation on training samples in the training set, and performing standardized processing on each training set data by replacing segmentation with word vectors;
(3) selecting training set tuples through a bagging algorithm, training each decision tree in the random forest model through a Radomtree algorithm, and repeating for M times to obtain M base classifiers;
(4) and (3) prediction: and performing vector conversion on the title of the news to be predicted, performing prediction voting by using the trained base classifier, taking the category with the largest predicted number as the category (such as a positive category and a negative category) of the news, dividing the number of the predicted specified categories (such as category 1) by the total number of the decision trees to obtain the probability p of the specified categories (such as category 1), wherein the value range of the probability p is [0, 1], converting the value range into the value range of [ -1, 1] by using the formula p ═ 2 × p-1, and taking the converted numerical value as the emotion score of the news to be predicted.
For example, suppose news a to be predicted is predicted by a training model (suppose the training model has 1000 decision trees).
If 520 trees are predicted as category 0 (representing a negative category) and 480 trees are predicted as category 1 (representing a positive category), the category of the news a to be predicted in this case is 0, and the corresponding sentiment score is score 2 (480/1000) -1-0.04;
if 520 trees are predicted as category 1 and 480 trees are predicted as category 0, the category of the news a to be predicted is 1 in this case, and the corresponding emotion score is 2 ═ 520/1000) -1 ═ 0.04.
The adjusting module 202 is configured to adjust the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, according to a preset event tag-event keyword rule. The event tag-event keyword rule may be set as a first file (e.g., a first dynamic dictionary), that is, specific content of the event tag-event keyword rule is recorded in a file form (in this embodiment, the first file). In this embodiment, the first file may include the following: event labels (used for distinguishing the categories of events, such as development adjustment and the like), event keywords (such as transformation, upgrading and the like), and emotion scores (scores) corresponding to the event keywords.
For example, the first file may be set to the format of the following file a:
Figure BDA0001395818320000091
in the above document a, if any event keyword (e.g., "transformation") in the first line is identified from the news headline, the main business event of the news is the corresponding event tag ("development adjustment"), and the emotional score of the news is 0.2.
Preferably, in this embodiment, the score range corresponding to the event keyword may be set to [ -1, 1 ]. Further, the scoring range may be further divided into sub-intervals of several gears, for example, into sub-intervals of four gears as follows: [ -1, -0.75), [ -0.75, -0.5), [ -0.5, -0.04), [ -0.04,1], wherein the sub-intervals [ -1, -0.75) and [ -0.75, -0.5) represent major negative news, [ -0.5, -0.04) represent general negative news and [ -0.04,1] represents positive news. Similarly, the emotion score range [ -1, 1] obtained by the predetermined machine learning algorithm (such as a random forest algorithm) can also be divided into sub-intervals of the four gears.
Specifically, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, includes the following steps:
traversing the title and the text of the news to be predicted;
if the event key words in the first file are identified from the title and the text of the news to be predicted, the emotion scores corresponding to the identified event key words in the first file are used as the final scores of the news to be predicted, and the event labels corresponding to the identified event key words are used as main business events of the news to be predicted;
and if the event key words in the first file are not identified from the title and the text of the news to be predicted, taking the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, as the final scores of the news to be predicted.
Preferably, in other embodiments, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
if the event keyword in the first file is identified from the title and the text of the news to be predicted, and the emotion score corresponding to the identified event keyword in the first file is not in the same classification as the emotion score obtained by the predetermined machine learning algorithm (i.e. in a subinterval corresponding to the same classification, such as [ -0.04,1]), performing weighted calculation on the emotion score corresponding to the identified event keyword in the first file and the emotion score obtained by the predetermined machine learning algorithm by taking the emotion score corresponding to the identified event keyword in the first file as a main weight to obtain a weighted score as the final score of the news to be predicted.
Specifically, the weight calculation includes: and multiplying the emotion score corresponding to the identified event keyword in the first file by a first preset proportion (such as 60%), multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion (such as 40%), and adding the products of the emotion score and the second preset proportion to obtain a weighted score which is used as the final score of the news to be predicted. The first preset proportion is larger than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
For example, if the emotion score corresponding to the identified event keyword in the first document is 0.2 (located in the sub-interval of the classification [ -0.04,1]), and the emotion score obtained by the predetermined machine learning algorithm is-0.2 (located in the sub-interval of the classification [ -0.5, -0.04)), both of which are apparently not in the same classification, the score is adjusted by using the score of 0.2 as the main weight.
Preferably, in other embodiments, the adjusting module 202 is further configured to:
and adjusting the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, according to a preset event label-event regular expression rule. The event tag-event regular expression rule may be set as a second file (such as a second dynamic dictionary), that is, specific content of the event tag-event regular expression rule is recorded in a file form (in this embodiment, the second file). In this embodiment, the second file may include the following: event labels (used for distinguishing the categories of events, such as performance pre-increase, etc.), event regular expressions (set according to different business experiences and related logics, as shown in the following document B), and emotion scores (scores) corresponding to each event regular expression.
For example, the second file may be set to the format of the following file B:
Figure BDA0001395818320000111
in the document B, if the content matching the regular expression of the event in the first row is identified from the news headline, the main business event of the news is the corresponding event label (e.g., "performance advance"), and the emotional score of the news is 0.4.
Preferably, the score range corresponding to the event regular expression can be set to [ -1, 1 ]. Further, the scoring range may be further divided into sub-intervals of several gears, for example, into sub-intervals of four gears as follows: [ -1, -0.75), [ -0.75, -0.5), [ -0.5, -0.04), [ -0.04,1], wherein the sub-intervals [ -1, -0.75) and [ -0.75, -0.5) represent major negative news, [ -0.5, -0.04) represent general negative news and [ -0.04,1] represents positive news.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
and if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, taking the emotion score of the event regular expression in the second file as the final score of the news to be predicted, and taking the event label corresponding to the event regular expression as the main business event of the news to be predicted.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
and if the event key words in the first file are not identified from the title and the text of the news to be predicted and the content conforming to the event regular expression in the second file is not identified, taking the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, as the final score of the news to be predicted.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
if the content conforming to the event regular expression in the second file is identified from the title and the text of the news to be predicted, and the emotion score of the event regular expression in the second file is not in the same grade as the emotion score obtained by the preset machine learning algorithm (namely, in a subinterval corresponding to the same grade, such as [ -0.04,1]), performing weighted calculation on the emotion score corresponding to the event regular expression in the second file and the emotion score obtained by the preset machine learning algorithm by taking the emotion score as a main weight to obtain a weighted score as the final score of the news to be predicted.
Specifically, the weight calculation includes: and multiplying the emotion score corresponding to the event regular expression in the second file by a first preset proportion (such as 60%), multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion (such as 40%), and adding the products of the emotion score and the second preset proportion to obtain a weighted score serving as a final score of the news to be predicted. The first preset proportion is larger than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
For example, if the emotion score of the event regular expression in the second file is 0.4 (located in the sub-interval of the step [ -0.04,1]), and the emotion score obtained by the predetermined machine learning algorithm is-0.4 (located in the sub-interval of the step [ -0.5, -0.04)), both of which are apparently not in the same step, the score is adjusted by using the score of 0.4 as the main weight.
The judging module 203 is configured to determine an emotion direction of the news to be predicted according to the adjusted emotion score of the news to be predicted. Specifically, if the adjusted emotion score of the news to be predicted is located in a first score interval (such as [ -1, -0.04)), determining that the emotion direction of the news to be predicted is negative; and if the adjusted emotion score of the news to be predicted is located in a second score interval (such as [ -0.04,1]), determining the emotion direction of the news to be predicted as the positive.
In other embodiments, the first scoring area or the second scoring area may be further subdivided. For example, the first score interval [ -1, -0.04) may be further divided into sub-intervals [ -1, -0.5) and [ -0.5, -0.04), wherein the sub-intervals [ -1, -0.5) represent major negative news and the sub-intervals [ -0.5, -0.04) represent general negative news.
Through the program module 201 and 203, the news emotion direction determination system 20 provided by the invention adjusts the news emotion scores obtained by the machine learning algorithm (such as the random forest algorithm) through the preset event label hit rule (including the event keyword or the event regular expression).
In addition, the invention also provides a news emotion direction judgment method.
Fig. 3 is a schematic diagram illustrating an implementation flow of the method for determining a news emotion direction according to an embodiment of the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 3 may be changed and some steps may be omitted according to different requirements.
Step S31, semantic scoring is carried out on the news to be predicted through a preset machine learning algorithm, and the emotion score of the news to be predicted is obtained.
Preferably, in this embodiment, the predetermined machine learning algorithm may adopt a random forest algorithm (e.g. open source package weka), and the semantic scoring thereof includes the following steps:
(1) firstly, manually selecting a random forest model training set, wherein positive and negative news data contents are the titles of each news;
(2) acquiring a Chinese word vector library (vector library corpus can be open-source wiki news content) required by a training set (training model), performing HanLP segmentation on training samples in the training set, and performing standardized processing on each training set data by replacing segmentation with word vectors;
(3) selecting training set tuples through a bagging algorithm, training each decision tree in the random forest model through a Radomtree algorithm, and repeating for M times to obtain M base classifiers;
(4) and (3) prediction: and performing vector conversion on the title of the news to be predicted, performing prediction voting by using the trained base classifier, taking the category with the largest predicted number as the category (such as a positive category and a negative category) of the news, dividing the number of the predicted specified categories (such as category 1) by the total number of the decision trees to obtain the probability p of the specified categories (such as category 1), wherein the value range of the probability p is [0, 1], converting the value range into the value range of [ -1, 1] by using the formula p ═ 2 × p-1, and taking the converted numerical value as the emotion score of the news to be predicted.
For example, suppose news a to be predicted is predicted by a training model (suppose the training model has 1000 decision trees).
If 520 trees are predicted as category 0 (representing a negative category) and 480 trees are predicted as category 1 (representing a positive category), the category of the news a to be predicted in this case is 0, and the corresponding sentiment score is score 2 (480/1000) -1-0.04;
if 520 trees are predicted as category 1 and 480 trees are predicted as category 0, the category of the news a to be predicted is 1 in this case, and the corresponding emotion score is 2 ═ 520/1000) -1 ═ 0.04.
And step S32, adjusting the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, according to a preset event label-event keyword rule. The event tag-event keyword rule may be set as a first file (e.g., a first dynamic dictionary), that is, specific content of the event tag-event keyword rule is recorded in a file form (in this embodiment, the first file). In this embodiment, the first file may include the following: event labels (used for distinguishing the categories of events, such as development adjustment and the like), event keywords (such as transformation, upgrading and the like), and emotion scores (scores) corresponding to the event keywords.
For example, the first file may be set in the format of the following table file a:
Figure BDA0001395818320000151
in the above document a, if any event keyword (e.g., "transformation") in the first line is identified from the news headline, the main business event of the news is the corresponding event tag ("development adjustment"), and the emotional score of the news is 0.2.
Preferably, in this embodiment, the score range corresponding to the event keyword may be set to [ -1, 1 ]. Further, the scoring range may be further divided into sub-intervals of several gears, for example, into sub-intervals of four gears as follows: [ -1, -0.75), [ -0.75, -0.5), [ -0.5, -0.04), [ -0.04,1], wherein the sub-intervals [ -1, -0.75) and [ -0.75, -0.5) represent major negative news, [ -0.5, -0.04) represent general negative news and [ -0.04,1] represents positive news. Similarly, the emotion score range [ -1, 1] obtained by the predetermined machine learning algorithm (such as a random forest algorithm) can also be divided into sub-intervals of the four gears.
Specifically, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, includes the following steps:
traversing the title and the text of the news to be predicted;
if the event key words in the first file are identified from the title and the text of the news to be predicted, the emotion scores corresponding to the identified event key words in the first file are used as the final scores of the news to be predicted, and the event labels corresponding to the identified event key words are used as main business events of the news to be predicted;
and if the event key words in the first file are not identified from the title and the text of the news to be predicted, taking the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, as the final scores of the news to be predicted.
Preferably, in other embodiments, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
if the event keyword in the first file is identified from the title and the text of the news to be predicted, and the emotion score corresponding to the identified event keyword in the first file is not in the same classification as the emotion score obtained by the predetermined machine learning algorithm (i.e. in a subinterval corresponding to the same classification, such as [ -0.04,1]), performing weighted calculation on the emotion score corresponding to the identified event keyword in the first file and the emotion score obtained by the predetermined machine learning algorithm by taking the emotion score corresponding to the identified event keyword in the first file as a main weight to obtain a weighted score as the final score of the news to be predicted.
Specifically, the weight calculation includes: and multiplying the emotion score corresponding to the identified event keyword in the first file by a first preset proportion (such as 60%), multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion (such as 40%), and adding the products of the emotion score and the second preset proportion to obtain a weighted score which is used as the final score of the news to be predicted. The first preset proportion is larger than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
For example, if the emotion score corresponding to the identified event keyword in the first document is 0.2 (located in the sub-interval of the classification [ -0.04,1]), and the emotion score obtained by the predetermined machine learning algorithm is-0.2 (located in the sub-interval of the classification [ -0.5, -0.04)), both of which are apparently not in the same classification, the score is adjusted by using the score of 0.2 as the main weight.
Preferably, in other embodiments, step S32 further includes the steps of:
and adjusting the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, according to a preset event label-event regular expression rule. The event tag-event regular expression rule may be set as a second file (such as a second dynamic dictionary), that is, specific content of the event tag-event regular expression rule is recorded in a file form (in this embodiment, the second file). In this embodiment, the second file may include the following: event labels (used for distinguishing the categories of events, such as performance pre-increase, etc.), event regular expressions (set according to different business experiences and related logics, as shown in the following document B), and emotion scores (scores) corresponding to each event regular expression.
For example, the second file may be set in the format of the following table file B:
Figure BDA0001395818320000171
in the document B, if the content matching the regular expression of the event in the first row is identified from the news headline, the main business event of the news is the corresponding event label (e.g., "performance advance"), and the emotional score of the news is 0.4.
Preferably, the score range corresponding to the event regular expression can be set to [ -1, 1 ]. Further, the scoring range may be further divided into sub-intervals of several gears, for example, into sub-intervals of four gears as follows: [ -1, -0.75), [ -0.75, -0.5), [ -0.5, -0.04), [ -0.04,1], wherein the sub-intervals [ -1, -0.75) and [ -0.75, -0.5) represent major negative news, [ -0.5, -0.04) represent general negative news and [ -0.04,1] represents positive news.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
and if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, taking the emotion score of the event regular expression in the second file as the final score of the news to be predicted, and taking the event label corresponding to the event regular expression as the main business event of the news to be predicted.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
and if the event key words in the first file are not identified from the title and the text of the news to be predicted and the content conforming to the event regular expression in the second file is not identified, taking the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, as the final score of the news to be predicted.
Further, in this case, the adjusting the emotion score of the news to be predicted, which is obtained by the predetermined machine learning algorithm, further includes the following steps:
if the content conforming to the event regular expression in the second file is identified from the title and the text of the news to be predicted, and the emotion score of the event regular expression in the second file is not in the same grade as the emotion score obtained by the preset machine learning algorithm (namely, in a subinterval corresponding to the same grade, such as [ -0.04,1]), performing weighted calculation on the emotion score corresponding to the event regular expression in the second file and the emotion score obtained by the preset machine learning algorithm by taking the emotion score as a main weight to obtain a weighted score as the final score of the news to be predicted.
Specifically, the weight calculation includes: and multiplying the emotion score corresponding to the event regular expression in the second file by a first preset proportion (such as 60%), multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion (such as 40%), and adding the products of the emotion score and the second preset proportion to obtain a weighted score serving as a final score of the news to be predicted. The first preset proportion is larger than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
For example, if the emotion score of the event regular expression in the second file is 0.4 (located in the sub-interval of the step [ -0.04,1]), and the emotion score obtained by the predetermined machine learning algorithm is-0.4 (located in the sub-interval of the step [ -0.5, -0.04)), both of which are apparently not in the same step, the score is adjusted by using the score of 0.4 as the main weight.
And step S33, determining the emotional direction of the news to be predicted according to the adjusted emotional score of the news to be predicted. Specifically, if the adjusted emotion score of the news to be predicted is located in a first score interval (such as [ -1, -0.04)), determining that the emotion direction of the news to be predicted is negative; and if the adjusted emotion score of the news to be predicted is located in a second score interval (such as [ -0.04,1]), determining the emotion direction of the news to be predicted as the positive.
In other embodiments, the first scoring area or the second scoring area may be further subdivided. For example, the first score interval [ -1, -0.04) may be further divided into sub-intervals [ -1, -0.5) and [ -0.5, -0.04), wherein the sub-intervals [ -1, -0.5) represent major negative news and the sub-intervals [ -0.5, -0.04) represent general negative news.
Through the steps S31-S33, the news emotion direction judgment method provided by the invention adjusts the news emotion scores obtained by a machine learning algorithm (such as a random forest algorithm) through the preset event label hit rules (including event keywords or event regular expressions), and compared with the traditional news emotion direction judgment method only adopting machine learning algorithms such as random forests and the like, the news emotion direction judgment method provided by the invention has the advantages that the score calculation result accuracy is higher, the coverage is wider and the customer experience is better.
Further, to achieve the above object, the present invention also provides a computer readable storage medium (e.g. ROM/RAM, magnetic disk, optical disk) storing a news emotion direction determination system 20, where the news emotion direction determination system 20 can be executed by at least one processor 22, so that the at least one processor 22 executes the steps of the news emotion direction determination method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not to be construed as limiting the scope of the invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Those skilled in the art can implement the invention in various modifications, such as features from one embodiment can be used in another embodiment to yield yet a further embodiment, without departing from the scope and spirit of the invention. All the equivalent structures or equivalent processes performed by using the contents of the specification and the drawings of the invention, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A news emotion direction judgment method is applied to electronic equipment and is characterized by comprising the following steps:
performing semantic scoring on news to be predicted through a preset machine learning algorithm to obtain the emotion score of the news to be predicted;
adjusting the emotion scores of the news to be predicted, which are obtained by the preset machine learning algorithm, according to a preset event label-event keyword rule, wherein the event label-event keyword rule is set as a first file, and the first file comprises event labels for distinguishing event types, event keywords and emotion scores corresponding to the event keywords;
or adjusting the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, according to a preset event label-event regular expression rule, wherein the event label-event regular expression rule is set as a second file, the second file comprises event labels for distinguishing event types, event regular expressions and emotion scores corresponding to the event regular expressions, the scoring ranges of the event keywords and the event regular expressions are set as preset intervals, and the preset intervals are divided into a plurality of sub-intervals; and
if the adjusted emotion score of the news to be predicted is located in the first scoring interval, determining that the emotion direction of the news to be predicted is negative; and if the adjusted emotion score of the news to be predicted is located in the second score interval, determining that the emotion direction of the news to be predicted is positive.
2. The method as claimed in claim 1, wherein the adjusting the sentiment score of the news to be predicted obtained by the predetermined machine learning algorithm comprises:
traversing the title and the text of the news to be predicted;
if the event key words in the first file are identified from the title and the text of the news to be predicted, the emotion scores corresponding to the identified event key words in the first file are used as the final scores of the news to be predicted, and the event labels corresponding to the identified event key words are used as main business events of the news to be predicted; and
and if the event key words in the first file are not identified from the title and the text of the news to be predicted, taking the emotion scores of the news to be predicted, which are acquired by the preset machine learning algorithm, as the final scores of the news to be predicted.
3. The method as claimed in claim 2, wherein the adjusting the sentiment score of the news to be predicted obtained by the predetermined machine learning algorithm further comprises:
if the event key words in the first file are identified from the title and the text of the news to be predicted, and the corresponding emotion scores of the identified event key words in the first file are not in the same grade with the emotion scores obtained by the preset machine learning algorithm, performing weighted calculation on the emotion scores obtained by the preset machine learning algorithm and the emotion scores corresponding to the identified event key words in the first file as main weights to obtain a weighted score as the final score of the news to be predicted.
4. The news emotion direction judgment method of claim 3, wherein the weighting calculation includes:
multiplying the emotion score corresponding to the identified event keyword in the first file by a first preset proportion, and multiplying the emotion score obtained by the preset machine learning algorithm by a second preset proportion; and
and adding the products of the two to obtain a weighted score as a final score of the news to be predicted, wherein the first preset proportion is greater than the second preset proportion, and the sum of the first preset proportion and the second preset proportion is 1.
5. The method as claimed in claim 1, wherein the adjusting the sentiment score of the news to be predicted obtained by the predetermined machine learning algorithm comprises:
if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, taking the emotion score of the event regular expression in the second file as the final score of the news to be predicted, and taking the event label corresponding to the event regular expression as the main business event of the news to be predicted; and
and if the event key words in the first file are not identified from the title and the text of the news to be predicted and the content conforming to the event regular expression in the second file is not identified, taking the emotion score of the news to be predicted, which is acquired by the preset machine learning algorithm, as the final score of the news to be predicted.
6. The method as claimed in claim 5, wherein the adjusting the sentiment score of the news to be predicted obtained by the predetermined machine learning algorithm further comprises:
if the content which is consistent with the event regular expression in the second file is identified from the title and the text of the news to be predicted, and the emotion score of the event regular expression in the second file is not in the same grade with the emotion score obtained by the preset machine learning algorithm, performing weighted calculation on the emotion score which is corresponding to the event regular expression in the second file and the emotion score obtained by the preset machine learning algorithm by taking the emotion score as a main weight to obtain a weighted score which is used as the final score of the news to be predicted.
7. An electronic device, comprising a memory and a processor, wherein the memory stores a news emotion direction determination system operable on the processor, and when the news emotion direction determination system is executed by the processor, the processor performs the steps of the news emotion direction determination method according to any one of claims 1 to 6.
8. A computer-readable storage medium storing a news emotion direction determination system, the news emotion direction determination system being executable by at least one processor to cause the at least one processor to perform the steps of the news emotion direction determination method as claimed in any one of claims 1 to 6.
CN201710775417.9A 2017-08-31 2017-08-31 News emotion direction judgment method, electronic device and computer readable storage medium Active CN107688651B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710775417.9A CN107688651B (en) 2017-08-31 2017-08-31 News emotion direction judgment method, electronic device and computer readable storage medium
PCT/CN2017/108811 WO2019041528A1 (en) 2017-08-31 2017-10-31 Method, electronic apparatus, and computer readable storage medium for determining polarity of news sentiment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710775417.9A CN107688651B (en) 2017-08-31 2017-08-31 News emotion direction judgment method, electronic device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107688651A CN107688651A (en) 2018-02-13
CN107688651B true CN107688651B (en) 2021-11-16

Family

ID=61155954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710775417.9A Active CN107688651B (en) 2017-08-31 2017-08-31 News emotion direction judgment method, electronic device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN107688651B (en)
WO (1) WO2019041528A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN111428118B (en) * 2019-11-08 2023-04-11 华东理工大学 Method for detecting event reliability and electronic equipment
CN111858903A (en) * 2020-06-11 2020-10-30 创新工场(北京)企业管理股份有限公司 Method and device for negative news early warning
CN113704501B (en) * 2021-08-10 2024-05-31 上海硬通网络科技有限公司 Application tag acquisition method and device, electronic equipment and storage medium
CN114186559B (en) * 2021-12-09 2022-09-13 北京深维智信科技有限公司 Method and system for determining role label of session body from sales session

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831184A (en) * 2012-08-01 2012-12-19 中国科学院自动化研究所 Method and system for predicating social emotions in accordance with word description on social event
CN102929861A (en) * 2012-10-22 2013-02-13 杭州东信北邮信息技术有限公司 Method and system for calculating text emotion index
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN106294326A (en) * 2016-08-23 2017-01-04 成都科来软件有限公司 A kind of news report Sentiment orientation analyzes method
CN106897439A (en) * 2017-02-28 2017-06-27 百度在线网络技术(北京)有限公司 The emotion identification method of text, device, server and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041953A1 (en) * 2010-08-16 2012-02-16 Microsoft Corporation Text mining of microblogs using latent topic labels
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
CN102682130B (en) * 2012-05-17 2013-11-27 苏州大学 Text sentiment classification method and system
CN103544321A (en) * 2013-11-06 2014-01-29 北京国双科技有限公司 Data processing method and device for micro-blog emotion information
CN103778215B (en) * 2014-01-17 2016-08-17 北京理工大学 A kind of Stock Market Forecasting method merged based on sentiment analysis and HMM
CN106202372A (en) * 2016-07-08 2016-12-07 中国电子科技网络信息安全有限公司 A kind of method of network text information emotional semantic classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831184A (en) * 2012-08-01 2012-12-19 中国科学院自动化研究所 Method and system for predicating social emotions in accordance with word description on social event
CN102929861A (en) * 2012-10-22 2013-02-13 杭州东信北邮信息技术有限公司 Method and system for calculating text emotion index
CN105740228A (en) * 2016-01-25 2016-07-06 云南大学 Internet public opinion analysis method
CN106294326A (en) * 2016-08-23 2017-01-04 成都科来软件有限公司 A kind of news report Sentiment orientation analyzes method
CN106897439A (en) * 2017-02-28 2017-06-27 百度在线网络技术(北京)有限公司 The emotion identification method of text, device, server and storage medium

Also Published As

Publication number Publication date
WO2019041528A1 (en) 2019-03-07
CN107688651A (en) 2018-02-13

Similar Documents

Publication Publication Date Title
CN107688651B (en) News emotion direction judgment method, electronic device and computer readable storage medium
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN110347835B (en) Text clustering method, electronic device and storage medium
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN109634698B (en) Menu display method and device, computer equipment and storage medium
US9436768B2 (en) System and method for pushing and distributing promotion content
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
CN111368043A (en) Event question-answering method, device, equipment and storage medium based on artificial intelligence
CN110825949A (en) Information retrieval method based on convolutional neural network and related equipment thereof
CN110334209B (en) Text classification method, device, medium and electronic equipment
KR20200007969A (en) Information processing methods, terminals, and computer storage media
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN110083832B (en) Article reprint relation identification method, device, equipment and readable storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN110427453B (en) Data similarity calculation method, device, computer equipment and storage medium
CN113722438A (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN111339166A (en) Word stock-based matching recommendation method, electronic device and storage medium
KR101472451B1 (en) System and Method for Managing Digital Contents
CN113127621A (en) Dialogue module pushing method, device, equipment and storage medium
CN111369148A (en) Object index monitoring method, electronic device and storage medium
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN111291551A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN112487159B (en) Search method, search device, and computer-readable storage medium
CN116644183B (en) Text classification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant