CN112347230B - Enterprise public opinion data analysis method based on Word2Vec - Google Patents

Enterprise public opinion data analysis method based on Word2Vec Download PDF

Info

Publication number
CN112347230B
CN112347230B CN202011282421.XA CN202011282421A CN112347230B CN 112347230 B CN112347230 B CN 112347230B CN 202011282421 A CN202011282421 A CN 202011282421A CN 112347230 B CN112347230 B CN 112347230B
Authority
CN
China
Prior art keywords
text
emotion
word
dictionary
public opinion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011282421.XA
Other languages
Chinese (zh)
Other versions
CN112347230A (en
Inventor
瞿学新
陈劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pinjian Intelligent Technology Co ltd
Original Assignee
Shanghai Pinjian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pinjian Intelligent Technology Co ltd filed Critical Shanghai Pinjian Intelligent Technology Co ltd
Priority to CN202011282421.XA priority Critical patent/CN112347230B/en
Publication of CN112347230A publication Critical patent/CN112347230A/en
Application granted granted Critical
Publication of CN112347230B publication Critical patent/CN112347230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an enterprise public opinion data analysis method based on Word2Vec, which comprises the following steps: the method comprises the steps of collecting and sorting, determining the emotion dictionary and obtaining a conclusion, wherein the emotion dictionary is expanded by Word2Vec, and the emotion tendencies of the texts are effectively analyzed by combining Word frequency, text length and reading quantity, so that the influence of the unaccounted text length and text reading quantity on the emotion tendencies is avoided. In addition, the invention creatively provides an enterprise public opinion data analysis method based on Word2Vec, which is used for analyzing the emotional tendency of public opinion of an enterprise, so that the enterprise or manager is helped to effectively analyze the public opinion, and further, the brand and client trust crisis is avoided.

Description

Enterprise public opinion data analysis method based on Word2Vec
Technical Field
The invention relates to the technical field of natural language processing, in particular to an enterprise public opinion data analysis method based on Word2 Vec.
Background
With popularization and development of Internet application and rising of emerging media such as microblogs, the public opinion has the characteristics of multiple channels, quick transmission, wide range and the like, and brings new challenges to enterprise management. Negative public opinion not only can damage enterprise brands and reduce customer trust, but also can bring economic loss to enterprises. Therefore, how to analyze the public opinion of enterprises in massive information and to twist the public opinion wind direction in time becomes important.
At present, with the data accumulation of platforms such as artificial intelligence rising and microblog, the natural language model is induced to conduct public opinion emotion prediction, so that the enterprise Internet public opinion is detected. Therefore, the method effectively analyzes news and comment texts of enterprises, and analyzes the opinion emotion value from the news and comment texts, and has practical significance.
Disclosure of Invention
The invention aims to provide an enterprise public opinion data analysis method based on Word2Vec, so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
A Word2 Vec-based enterprise public opinion data analysis method comprises the following steps: the method comprises the steps of collecting and sorting, determining an emotion dictionary and drawing a conclusion.
Step 1, collecting and finishing: defining stop words of a text training set, and preprocessing each Chinese text word segmentation and filtering stop word in a text data set to obtain a preprocessed text training set;
Wherein, step 1.1: defining text data Txt = { txt 1,txt2,……,txtnum }, wherein num is the total number of texts;
Step 1.2: defining a text stop word set S= { st 1,st2,……,stsn }, wherein sn is the number of stop words;
Step 1.3: text in Txt is segmented and stop words S are filtered, and ft= { ft 1,ft2,……,ftnum } is obtained after text preprocessing, wherein ft p={fw1,fw2,……,fwm } is a collection after p-th text segmentation, and p is [1, num ].
Step 2, determining an emotion dictionary: defining an emotion dictionary, training a preprocessed text set through Word2Vec, and supplementing words which are not recorded in the emotion dictionary by combining a cosine similarity algorithm to obtain an expanded emotion dictionary;
Wherein, step 2.1: defining an initial emotion dictionary comprising emotion word sets ew= { ew 1,ew2,……,ews } and
Step 2.2: removing repeated words from each text in the text set ft to obtain a word set t= { t 1,t2,……,tb };
step 2.3: word2Vec is used for training a text set ft to obtain Word vectors of words in t, and cosine similarity is used for calculating similarity between every two words, so that a similarity set with arbitrary Word similarity larger than beta is obtained And its corresponding similarity/>Wherein/>Beta defaults to 0.7;
Step 2.4: c is set as a circulation variable and is used for traversing the word set t and assigning 1;
Step 2.5: when the cyclic variable c < = b, executing step 2.6, otherwise executing step 2.10;
Step 2.6: when (when) And/>Executing the step 2.7 if yes, otherwise executing the step 2.9;
step 2.7: calculating the emotion value of the word tc, wherein the formula is as follows:
Step 2.8: the word tc is added to emotion ew = ew%t c, The dictionary;
Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;
step 2.10: obtaining a supplementary emotion dictionary ew and a corresponding emotion value ev;
Step 3, concluding: calculating emotion values of the preprocessed text set through the expanded emotion dictionary and an improved emotion dictionary calculation method to obtain emotion values of enterprise public opinion;
step 3.1: let r be the circulation variable, used for traversing the text set ft, and assign 1;
Step 3.2: when the cyclic variable r < = n, executing the step 3.3, otherwise executing the step 3.5;
step 3.3: the emotion value scorer of the text ftr is calculated, and the formula is as follows:
where fj is the word frequency of the word j in the text ftr, rcr is the reading of the text ftr, min_rc and max_rc are the minimum and maximum reading of the text set ft, dlr is the length of the text ftr, and avgdl is the average length of the text in the text set ft;
Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;
step 3.5: calculation by formula And obtaining the emotion value of the enterprise public opinion by the emotion value in the text set ft.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the Word2Vec is used for expanding the emotion dictionary, and the text emotion tendencies are effectively analyzed by combining Word frequency, text length and reading quantity, so that the influence of the text length and the text reading quantity which are not considered on the emotion tendencies is avoided. In addition, the invention creatively provides an enterprise public opinion data analysis method based on Word2Vec, which is used for analyzing the emotional tendency of public opinion of an enterprise, so that the enterprise or manager is helped to effectively analyze the public opinion, and further, the brand and client trust crisis is avoided.
Drawings
Fig. 1 is a general flow chart of the present invention.
Fig. 2 is a flowchart of text training set obtained after text preprocessing in fig. 1.
Fig. 3 is a flowchart of the extended emotion dictionary of fig. 1.
Fig. 4 is a flow chart of fig. 1 for analyzing emotion values for training text.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Referring to fig. 1-2, an enterprise public opinion data analysis method based on Word2Vec includes the following steps: the method comprises the steps of collecting and sorting, determining an emotion dictionary and drawing a conclusion.
Step 1, collecting and finishing: defining stop words of a text training set, and preprocessing each Chinese text word segmentation and filtering stop word in a text data set to obtain a preprocessed text training set;
Wherein, step 1.1: defining text data Txt = { txt 1,txt2,……,txtnum }, wherein num is the total number of texts;
Step 1.2: defining a text stop word set S= { st 1,st2,……,stsn }, wherein sn is the number of stop words;
Step 1.3: text in Txt is segmented and stop words S are filtered, and ft= { ft 1,ft2,……,ftnum } is obtained after text preprocessing, wherein ft p={fw1,fw2,……,fwm } is a collection after p-th text segmentation, and p is [1, num ].
As shown in fig. 2, step 2, determining an emotion dictionary: defining an emotion dictionary, training a preprocessed text set through Word2Vec, and supplementing words which are not recorded in the emotion dictionary by combining a cosine similarity algorithm to obtain an expanded emotion dictionary;
Wherein, step 2.1: defining an initial emotion dictionary comprising emotion word sets ew= { ew 1,ew2,……,ews } and
Step 2.2: removing repeated words from each text in the text set ft to obtain a word set t= { t 1,t2,……,tb };
step 2.3: word2Vec is used for training a text set ft to obtain Word vectors of words in t, and cosine similarity is used for calculating similarity between every two words, so that a similarity set with arbitrary Word similarity larger than beta is obtained And its corresponding similarity/>Wherein/>Beta defaults to 0.7;
Step 2.4: c is set as a circulation variable and is used for traversing the word set t and assigning 1;
Step 2.5: when the cyclic variable c < = b, executing step 2.6, otherwise executing step 2.10;
Step 2.6: when (when) And/>Executing the step 2.7 if yes, otherwise executing the step 2.9;
step 2.7: calculating the emotion value of the word tc, wherein the formula is as follows:
Step 2.8: the word tc is added to emotion ew = ew%t c, The dictionary;
Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;
step 2.10: obtaining a supplementary emotion dictionary ew and a corresponding emotion value ev;
as in fig. 3, step 3, conclude the steps of: calculating emotion values of the preprocessed text set through the expanded emotion dictionary and an improved emotion dictionary calculation method to obtain emotion values of enterprise public opinion;
step 3.1: let r be the circulation variable, used for traversing the text set ft, and assign 1;
Step 3.2: when the cyclic variable r < = n, executing the step 3.3, otherwise executing the step 3.5;
step 3.3: the emotion value scorer of the text ftr is calculated, and the formula is as follows:
where fj is the word frequency of the word j in the text ftr, rcr is the reading of the text ftr, min_rc and max_rc are the minimum and maximum reading of the text set ft, dlr is the length of the text ftr, and avgdl is the average length of the text in the text set ft;
Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;
step 3.5: calculation by formula And obtaining the emotion value of the enterprise public opinion by the emotion value in the text set ft.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (1)

1. A Word2 Vec-based enterprise public opinion data analysis method comprises the following steps: collecting and sorting, determining an emotion dictionary and obtaining a conclusion; in particular to a special-shaped ceramic tile,
Step 1, collecting and finishing: defining stop words of a text training set, and preprocessing each Chinese text word segmentation and filtering stop word in a text data set to obtain a preprocessed text training set;
Step 2, determining an emotion dictionary: defining an emotion dictionary, training a preprocessed text set through Word2Vec, and supplementing words which are not recorded in the emotion dictionary by combining a cosine similarity algorithm to obtain an expanded emotion dictionary;
Step 3, concluding: calculating emotion values of the preprocessed text set through the expanded emotion dictionary and an improved emotion dictionary calculation method to obtain emotion values of enterprise public opinion;
step1, including step 1.1: defining text data Txt = { txt 1,txt2,……,txtnum }, wherein num is the total number of texts;
Step 1.2: defining a text stop word set S= { st 1,st2,……,stsn }, wherein sn is the number of stop words;
Step 1.3: word segmentation and stop word S filtering are carried out on the text in Txt, and ft=is obtained after text preprocessing
{ Ft 1,ft2,……,ftnum }, wherein ft p={fw1,fw2,……,fwm } is the set after the p-th text word segmentation, p ε [1, num ];
step 2 includes step 2.1: defining an initial emotion dictionary containing emotion word set ew=
{ Ew 1,ew2,……,ews } and corresponding emotion value set
Step 2.2: removing repeated words from each text in the text set ft to obtain a word set t=
{t1,t2,……,tb};
Step 2.3: word2Vec is used for training a text set ft to obtain Word vectors of words in t, and cosine similarity is used for calculating similarity between every two words, so that a similarity set with arbitrary Word similarity larger than beta is obtainedAnd its corresponding similarity/>Wherein, the similarity corresponding to w b∈t,wb is/>Wherein/>Beta defaults to 0.7;
Step 2.4: c is set as a circulation variable and is used for traversing the word set t and assigning 1;
Step 2.5: when the cyclic variable c < = b, executing step 2.6, otherwise executing step 2.10;
Step 2.6: when (when) And/>Executing the step 2.7 if yes, otherwise executing the step 2.9;
step 2.7: the emotion value of the word t c is calculated, and the formula is as follows:
Step 2.8: the word t c is added to emotion ew = ew u-t c, The dictionary;
Step 2.9: the loop variable c=c+1, and the step 2.5 is executed back;
step 2.10: obtaining a supplementary emotion word set ew and a corresponding emotion value set ev;
step 3 includes step 3.1: let r be the circulation variable, used for traversing the text set ft, and assign 1;
step 3.2: when the cyclic variable r < = num, executing the step 3.3, otherwise executing the step 3.5;
Step 3.3: computing text Emotion value of (2)The formula is as follows:
Where f j is the word frequency of word j in text ft r, rc r is the reading of text ft r, min_rc and max_rc are the minimum and maximum reading of text set ft, dl r is the length of text ft r, avgdl is the average length of text in text set ft;
Step 3.4: the loop variable r=r+1, and the step 3.2 is executed back;
step 3.5: calculation by formula And obtaining the emotion value of the enterprise public opinion by the emotion value in the text set ft.
CN202011282421.XA 2020-11-16 2020-11-16 Enterprise public opinion data analysis method based on Word2Vec Active CN112347230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282421.XA CN112347230B (en) 2020-11-16 2020-11-16 Enterprise public opinion data analysis method based on Word2Vec

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282421.XA CN112347230B (en) 2020-11-16 2020-11-16 Enterprise public opinion data analysis method based on Word2Vec

Publications (2)

Publication Number Publication Date
CN112347230A CN112347230A (en) 2021-02-09
CN112347230B true CN112347230B (en) 2024-04-19

Family

ID=74362945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282421.XA Active CN112347230B (en) 2020-11-16 2020-11-16 Enterprise public opinion data analysis method based on Word2Vec

Country Status (1)

Country Link
CN (1) CN112347230B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348644B (en) * 2020-11-16 2024-04-02 上海品见智能科技有限公司 Abnormal logistics order detection method by establishing monotonic positive correlation filter screen

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239439A (en) * 2017-04-19 2017-10-10 同济大学 Public sentiment sentiment classification method based on word2vec
CN107315778A (en) * 2017-05-31 2017-11-03 温州市鹿城区中津先进科技研究院 A kind of natural language the analysis of public opinion method based on big data sentiment analysis
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239439A (en) * 2017-04-19 2017-10-10 同济大学 Public sentiment sentiment classification method based on word2vec
CN107315778A (en) * 2017-05-31 2017-11-03 温州市鹿城区中津先进科技研究院 A kind of natural language the analysis of public opinion method based on big data sentiment analysis
WO2019227710A1 (en) * 2018-05-31 2019-12-05 平安科技(深圳)有限公司 Network public opinion analysis method and apparatus, and computer-readable storage medium
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Sensitivity and Performance Analysis of Word2Vec Applied to Emotion State Classification Using a Deep Neural Architecture;Rodrigo Pasti等;《Distributed Computing and Artificial Intelligence, 16th International Conference. DCAI 2019. Advances in Intelligent Systems and Computing》;20190622;199-206 *
基于Word2Vec新词识别的评论情感分析系统的研究与实现;王云龙;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第01期);I138-5111 *
股市舆情数据的挖掘与分析研究;张翰垠;《中国优秀硕士学位论文全文数据库信息科技辑》;20190915(第09期);I138-531 *

Also Published As

Publication number Publication date
CN112347230A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN106383877B (en) Social media online short text clustering and topic detection method
CN109033307B (en) CRP clustering-based word multi-prototype vector representation and word sense disambiguation method
CN109783639B (en) Mediated case intelligent dispatching method and system based on feature extraction
CN110162591B (en) Entity alignment method and system for digital education resources
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN105893611B (en) Method for constructing interest topic semantic network facing social network
US9141853B1 (en) System and method for extracting information from documents
CN105389341B (en) A kind of service calls repeat the text cluster and analysis method of incoming call work order
CN109885693B (en) Method and system for rapid knowledge comparison based on knowledge graph
CN112347230B (en) Enterprise public opinion data analysis method based on Word2Vec
CN111191825A (en) User default prediction method and device and electronic equipment
CN111626050A (en) Microblog emotion analysis method based on expression dictionary and emotion common sense
CN108960772A (en) Enterprise&#39;s evaluation householder method and system based on deep learning
CN111694961A (en) Keyword semantic classification method and system for sensitive data leakage detection
AU2018267668B2 (en) Systems and methods for segmenting interactive session text
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
Nodarakis et al. Using hadoop for large scale analysis on twitter: A technical report
CN111859984B (en) Intention mining method, device, equipment and storage medium
CN111241142A (en) Scientific and technological achievement conversion pushing system and method
US20230004715A1 (en) Method and apparatus for constructing object relationship network, and electronic device
CN107291952B (en) Method and device for extracting meaningful strings
CN112883703B (en) Method, device, electronic equipment and storage medium for identifying associated text
CN113468866B (en) Method and device for analyzing non-standard JSON string
CN115329173A (en) Method and device for determining enterprise credit based on public opinion monitoring
CN110377845B (en) Collaborative filtering recommendation method based on interval semi-supervised LDA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant