CN112347230B

CN112347230B - Enterprise public opinion data analysis method based on Word2Vec

Info

Publication number: CN112347230B
Application number: CN202011282421.XA
Authority: CN
Inventors: 瞿学新; 陈劲
Original assignee: Shanghai Pinjian Intelligent Technology Co ltd
Current assignee: Shanghai Pinjian Intelligent Technology Co ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2024-04-19
Anticipated expiration: 2040-11-16
Also published as: CN112347230A

Abstract

The invention discloses an enterprise public opinion data analysis method based on Word2Vec, which comprises the following steps: the method comprises the steps of collecting and sorting, determining the emotion dictionary and obtaining a conclusion, wherein the emotion dictionary is expanded by Word2Vec, and the emotion tendencies of the texts are effectively analyzed by combining Word frequency, text length and reading quantity, so that the influence of the unaccounted text length and text reading quantity on the emotion tendencies is avoided. In addition, the invention creatively provides an enterprise public opinion data analysis method based on Word2Vec, which is used for analyzing the emotional tendency of public opinion of an enterprise, so that the enterprise or manager is helped to effectively analyze the public opinion, and further, the brand and client trust crisis is avoided.

Description

Enterprise public opinion data analysis method based on Word2Vec

Technical Field

The invention relates to the technical field of natural language processing, in particular to an enterprise public opinion data analysis method based on Word2 Vec.

Background

With popularization and development of Internet application and rising of emerging media such as microblogs, the public opinion has the characteristics of multiple channels, quick transmission, wide range and the like, and brings new challenges to enterprise management. Negative public opinion not only can damage enterprise brands and reduce customer trust, but also can bring economic loss to enterprises. Therefore, how to analyze the public opinion of enterprises in massive information and to twist the public opinion wind direction in time becomes important.

At present, with the data accumulation of platforms such as artificial intelligence rising and microblog, the natural language model is induced to conduct public opinion emotion prediction, so that the enterprise Internet public opinion is detected. Therefore, the method effectively analyzes news and comment texts of enterprises, and analyzes the opinion emotion value from the news and comment texts, and has practical significance.

Disclosure of Invention

The invention aims to provide an enterprise public opinion data analysis method based on Word2Vec, so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions:

A Word2 Vec-based enterprise public opinion data analysis method comprises the following steps: the method comprises the steps of collecting and sorting, determining an emotion dictionary and drawing a conclusion.

Step 1, collecting and finishing: defining stop words of a text training set, and preprocessing each Chinese text word segmentation and filtering stop word in a text data set to obtain a preprocessed text training set;

Wherein, step 1.1: defining text data Txt = { txt ₁,txt₂,……,txt_num }, wherein num is the total number of texts;

Step 1.2: defining a text stop word set S= { st ₁,st₂,……,st_sn }, wherein sn is the number of stop words;

Step 1.3: text in Txt is segmented and stop words S are filtered, and ft= { ft ₁,ft₂,……,ft_num } is obtained after text preprocessing, wherein ft _p＝{fw₁,fw₂,……,fw_m } is a collection after p-th text segmentation, and p is [1, num ].

Step 2, determining an emotion dictionary: defining an emotion dictionary, training a preprocessed text set through Word2Vec, and supplementing words which are not recorded in the emotion dictionary by combining a cosine similarity algorithm to obtain an expanded emotion dictionary;

Wherein, step 2.1: defining an initial emotion dictionary comprising emotion word sets ew= { ew ₁,ew₂,……,ew_s } and

Step 2.2: removing repeated words from each text in the text set ft to obtain a word set t= { t ₁,t₂,……,t_b };

step 2.3: word2Vec is used for training a text set ft to obtain Word vectors of words in t, and cosine similarity is used for calculating similarity between every two words, so that a similarity set with arbitrary Word similarity larger than beta is obtained And its corresponding similarity/>Wherein/>Beta defaults to 0.7;

Step 2.4: c is set as a circulation variable and is used for traversing the word set t and assigning 1;

Step 2.5: when the cyclic variable c < = b, executing step 2.6, otherwise executing step 2.10;

Step 2.6: when (when) And/>Executing the step 2.7 if yes, otherwise executing the step 2.9;

step 2.7: calculating the emotion value of the word tc, wherein the formula is as follows:

Step 2.8: the word tc is added to emotion ew = ew%t _c, The dictionary;

Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;

step 2.10: obtaining a supplementary emotion dictionary ew and a corresponding emotion value ev;

Step 3, concluding: calculating emotion values of the preprocessed text set through the expanded emotion dictionary and an improved emotion dictionary calculation method to obtain emotion values of enterprise public opinion;

step 3.1: let r be the circulation variable, used for traversing the text set ft, and assign 1;

Step 3.2: when the cyclic variable r < = n, executing the step 3.3, otherwise executing the step 3.5;

step 3.3: the emotion value scorer of the text ftr is calculated, and the formula is as follows:

where fj is the word frequency of the word j in the text ftr, rcr is the reading of the text ftr, min_rc and max_rc are the minimum and maximum reading of the text set ft, dlr is the length of the text ftr, and avgdl is the average length of the text in the text set ft;

Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;

step 3.5: calculation by formula And obtaining the emotion value of the enterprise public opinion by the emotion value in the text set ft.

Compared with the prior art, the invention has the beneficial effects that: according to the method, the Word2Vec is used for expanding the emotion dictionary, and the text emotion tendencies are effectively analyzed by combining Word frequency, text length and reading quantity, so that the influence of the text length and the text reading quantity which are not considered on the emotion tendencies is avoided. In addition, the invention creatively provides an enterprise public opinion data analysis method based on Word2Vec, which is used for analyzing the emotional tendency of public opinion of an enterprise, so that the enterprise or manager is helped to effectively analyze the public opinion, and further, the brand and client trust crisis is avoided.

Drawings

Fig. 1 is a general flow chart of the present invention.

Fig. 2 is a flowchart of text training set obtained after text preprocessing in fig. 1.

Fig. 3 is a flowchart of the extended emotion dictionary of fig. 1.

Fig. 4 is a flow chart of fig. 1 for analyzing emotion values for training text.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Referring to fig. 1-2, an enterprise public opinion data analysis method based on Word2Vec includes the following steps: the method comprises the steps of collecting and sorting, determining an emotion dictionary and drawing a conclusion.

As shown in fig. 2, step 2, determining an emotion dictionary: defining an emotion dictionary, training a preprocessed text set through Word2Vec, and supplementing words which are not recorded in the emotion dictionary by combining a cosine similarity algorithm to obtain an expanded emotion dictionary;

Step 2.8: the word tc is added to emotion ew = ew%t _c, The dictionary;

Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;

as in fig. 3, step 3, conclude the steps of: calculating emotion values of the preprocessed text set through the expanded emotion dictionary and an improved emotion dictionary calculation method to obtain emotion values of enterprise public opinion;

Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. A Word2 Vec-based enterprise public opinion data analysis method comprises the following steps: collecting and sorting, determining an emotion dictionary and obtaining a conclusion; in particular to a special-shaped ceramic tile,

step1, including step 1.1: defining text data Txt = { txt ₁,txt₂,……,txt_num }, wherein num is the total number of texts;

Step 1.3: word segmentation and stop word S filtering are carried out on the text in Txt, and ft=is obtained after text preprocessing

{ Ft ₁,ft₂,……,ft_num }, wherein ft _p＝{fw₁,fw₂,……,fw_m } is the set after the p-th text word segmentation, p ε [1, num ];

step 2 includes step 2.1: defining an initial emotion dictionary containing emotion word set ew=

{ Ew ₁,ew₂,……,ew_s } and corresponding emotion value set

Step 2.2: removing repeated words from each text in the text set ft to obtain a word set t=

{t₁,t₂,……,t_b}；

Step 2.3: word2Vec is used for training a text set ft to obtain Word vectors of words in t, and cosine similarity is used for calculating similarity between every two words, so that a similarity set with arbitrary Word similarity larger than beta is obtainedAnd its corresponding similarity/>Wherein, the similarity corresponding to w _b∈t,w_b is/>Wherein/>Beta defaults to 0.7;

step 2.7: the emotion value of the word t _c is calculated, and the formula is as follows:

Step 2.8: the word t _c is added to emotion ew = ew u-t _c, The dictionary;

Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;

step 2.10: obtaining a supplementary emotion word set ew and a corresponding emotion value set ev;

step 3 includes step 3.1: let r be the circulation variable, used for traversing the text set ft, and assign 1;

step 3.2: when the cyclic variable r < = num, executing the step 3.3, otherwise executing the step 3.5;

Step 3.3: computing text Emotion value of (2)The formula is as follows:

Where f _j is the word frequency of word j in text ft _r, rc _r is the reading of text ft _r, min_rc and max_rc are the minimum and maximum reading of text set ft, dl _r is the length of text ft _r, avgdl is the average length of text in text set ft;

Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;