CN112699674A - Public opinion classification method for special equipment - Google Patents
Public opinion classification method for special equipment Download PDFInfo
- Publication number
- CN112699674A CN112699674A CN202110030059.5A CN202110030059A CN112699674A CN 112699674 A CN112699674 A CN 112699674A CN 202110030059 A CN202110030059 A CN 202110030059A CN 112699674 A CN112699674 A CN 112699674A
- Authority
- CN
- China
- Prior art keywords
- public opinion
- word
- special equipment
- text
- public
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000001914 filtration Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 13
- 230000002159 abnormal effect Effects 0.000 claims abstract description 5
- 238000012795 verification Methods 0.000 claims abstract description 3
- 230000009849 deactivation Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a special equipment public opinion classification method, which comprises the following steps: the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors; carrying out classified prediction on the word vectors to obtain the classes of special equipment related to public sentiments; when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The public opinion text is split, the verified public opinion text is subjected to word segmentation and word filtering stop words to obtain a plurality of public opinion data word lists, the scheme realizes the analysis and the processing of the public opinion data of the special equipment, meets the requirement of the public opinion information classification of the special equipment and is beneficial to the efficient management of the public opinion of the special equipment.
Description
Technical Field
The invention relates to the field of special equipment, in particular to a special equipment public opinion classification method applied to the aspect of equipment management, which is beneficial to emergency disposal of characteristic equipment public opinions.
Background
The special equipment refers to boilers, pressure vessels (containing gas cylinders), pressure pipelines, elevators, hoisting machinery, passenger ropeways, large-scale amusement facilities and special motor vehicles (1) in yards (factories) which have great danger to personal and property safety. The emergency handling capacity of the special equipment is an important guarantee for properly handling the work of emergency safety events, accident emergency rescue and the like of the special equipment. By the end of 2019, the total amount of special equipment in China reaches about 1525.47 ten thousands, and the construction of accelerating the emergency handling capacity of the special equipment is urgent.
Public sentiment is the sum of various emotions, will, attitudes and opinions held by various public matters concerned by or closely related to the interests of oneself, across a certain historical period and social space. The collection and report work of the accident public opinion information of the special equipment is the basis for the emergency disposal work of the special equipment. In recent years, relevant scholars develop researches around public opinion processing and system research and application of special equipment, and play an active role in improving public opinion collecting and analyzing capacity of the special equipment. However, the classification of the special equipment and the equipment type information in the public opinion information is not standard, and the classification is often performed manually, so that the public opinion data processing efficiency is greatly restricted.
Disclosure of Invention
The invention aims to provide a special equipment public opinion classification method, which realizes the analysis and processing of special equipment public opinion data, meets the requirement on special equipment public opinion information classification and is beneficial to the efficient management of special equipment public opinions.
In order to achieve the purpose, the invention provides the following technical scheme: a public opinion classification method for special equipment comprises the following steps: firstly, public opinion texts are obtained, and the public opinion texts are verified, split and vectorized to be converted into word vectors; and then classifying and predicting the word vectors to obtain the classes of special equipment related to the public sentiment. The special equipment category is determined, and public opinion management is facilitated.
Preferably, when the public opinion text is checked, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The accuracy of original data of public opinion texts is ensured.
Preferably, the public opinion text splitting is to obtain a plurality of public opinion data word lists by carrying out word segmentation and word filtering stop words on the verified public opinion text; and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.
When the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.
During vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the appearance sequence of each word is not considered, and only the appearance frequency v of each word is changediMaking statistics to form a feature vector V ═ V1,v2,…,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,…,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,…,c9Then the public sentiment data set can be expressed as:
k=1,2,…,9
in the classification prediction, firstly, the method is based on
k=1,2,…,9;j=1,2,…,N;l=1,2,…,n;λ=1;
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
And then selecting the maximum posterior probability as the type of the special equipment.
When the word segmentation is carried out, sentence division, word division capable of being word division and long word division are adopted, namely three word segmentation modes are adopted: (1) the accurate mode supports the sentence division with the highest accuracy and is suitable for text analysis; (2) the full mode can scan all words which can be formed into words in a sentence, is high in speed, and is difficult to solve the ambiguity problem; (3) and the search engine mode is used for segmenting long words based on the accurate mode and is suitable for word segmentation of the search engine.
Through the description, the method of the scheme centers on the processing of original data of public sentiment texts of special equipment, the splitting of sentences and the vectorization. The data quality check mainly checks whether the public sentiment text has a missing value and an abnormal value, and performs data supplement or elimination. The sentence splitting is mainly realized by word segmentation and word filtering stop, the Chinese word segmentation algorithm can be divided into word segmentation based on rules, word segmentation based on statistics and two types of combined word segmentation, and common model libraries comprise jieba, Ansj, ancient word segmentation and the like. The filtering stop words have a function similar to a filter, mainly realize noise filtering in text data, are generally realized by stopping a lexicon, and need to select a proper stop lexicon by combining application fields, such as a stop lexicon table in Hayada, a stop lexicon in a Sichuan university machine intelligent laboratory, and the like. Text vectorization realizes that characters or words are converted into Word vectors, and common methods include One-Hot encoding, a Word bag method, Word2Vec and the like. In the public opinion preprocessing link of special equipment, keyword extraction can be realized by TF-IDF, TextRank and other methods, and text features can be further extracted conveniently; when classification prediction is carried out, the maximum posterior probability is adopted, so that public opinion data analysis of special equipment by applying an artificial neural network algorithm is possible.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 shows a cloud of special equipment and sentiment words.
FIG. 3 shows a confusion matrix for public sentiment classification prediction of special equipment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the attached drawings, the invention provides a method for classifying public sentiments of special equipment,
the method comprises the steps of firstly obtaining a public opinion text for verification, judging whether the public opinion text has a missing value and an abnormal value when the public opinion text is verified, and supplementing or removing public opinion text data.
The public opinion text is split, namely, the verified public opinion text is subjected to word segmentation and word stop word filtering to obtain a plurality of public opinion data word lists, when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a dynamic planning is adopted to search a maximum probability path, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment. Such as a word list for use in Hardsda, a word library for use in Sichuan university machine intelligence laboratories, etc.
When vectorizing, the public opinion data word list after the word segmentation and the filtering word are stopped is not considered to appear each wordOnly the frequency v of occurrence of each wordiMaking statistics to form a feature vector V ═ V1,v2,…,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,…,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,…,c9Then the public sentiment data set can be expressed as:
k=1,2,…,9
in the classification prediction, firstly, the method is based on
k=1,2,…,9;j=1,2,…,N;l=1,2,…,n;λ=1;
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
And then selecting the maximum posterior probability as the type of the special equipment.
If a period of time in the public opinion monitoring system of the special equipment is selected, total 6984 pieces of public opinion data of the special equipment, including public opinion sources, occurrence time, public opinion titles, public opinion contents and the like, are processed by applying Python language, after data items with null values are removed, the class of the special equipment to which the sample data belongs is labeled by using a manual labeling method, meanwhile, modeling analysis is performed by using a machine learning method conveniently, digital labels are set for various classes of the special equipment, and if 6983 pieces of effective sample data are obtained, the table 1 shows.
TABLE 1 public opinion data distribution for special equipment types
Obtaining a plurality of public opinion data word lists by carrying out word segmentation and word stop word filtering on verified public opinion texts
TABLE 2 public opinion data of special equipment
And applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying. The word cloud can visually display words frequently appearing in the text in an image mode, as shown in fig. 2, it can be seen that elevators, trapped persons, accidents, gas tanks, leakage, explosion and the like frequently appear, the appearance times are shown in table 3, and therefore, public sentiments of special equipment have obvious text characteristics, and then text vectorization is carried out.
TABLE 3 partial word list and word frequency of special equipment public sentiment
And (3) applying a cross validation method, randomly dividing the public opinion sample data of the special equipment into a training set and a testing set according to the proportion of 75% to 25% by using a train _ test _ split method, further obtaining the public opinion text feature vectors of the training set and the testing set by using a word frequency statistics method, and preparing for developing modeling analysis.
Modeling is carried out on a training set to obtain the maximum posterior probability, the modeling effect is evaluated through a test set, the obtained confusion matrix is shown in a figure 3, the evaluation result is shown in a table 4, wherein in the figure 3, a boiler (label 1), a pressure container (label 2), a pressure pipeline (label 3), an elevator (label 4), a hoisting machine (label 5), a passenger transport cableway (label 6), a large-scale amusement facility (label 7), a special motor vehicle (label 8) in a field (factory), the right side in the figure 3 shows public opinion quantity, the bottom in the figure 3 shows special equipment category, the left side in the figure 3 shows the special equipment category, and the special motor vehicle number in the field (factory) is less, so that the special equipment category is not listed on the left side in the figure; evaluation indexes were performed for accuracy (Precision), Recall (Recall), and overall evaluation index (f1-score) [17], and are defined as follows:
accuracy (P, Precision) is the number of correct predictions/total number of test sets by public sentiment classification
Recall (R, Recall) the correct number of public opinion categories/total number of special equipment of the type in the test set
Overall evaluation index (F1-score) ═ 2PR/(P + R)
Table 4 public opinion classification model evaluation result table for special equipment
As can be seen from the table 4, the overall prediction accuracy of the model reaches 95%, wherein the prediction accuracy, the recall rate and the comprehensive evaluation of the pressure container (label 2) and the elevator (label 4) reach more than 90%, the prediction accuracy of the pressure pipeline (label 3), the hoisting machinery (label 5) and the large-scale amusement facility (label 7) reaches more than 80%, and the overall prediction result of the model is better. The total number of public opinion samples of three special equipment, namely passenger transport cableways (label 6), special motor vehicles (label 8) in factories (label 1) and boilers (label 1) is within 50, the prediction effect is not ideal enough, but with the accumulation of public opinion texts, when the number of public opinions reaches more than 90, the prediction accuracy can reach 80%, the recall rate reaches 60%, and the comprehensive evaluation reaches more than 70%, so that the method has a good application prospect.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (7)
1. A public opinion classification method for special equipment is characterized by comprising the following steps:
the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors;
and classifying and predicting the word vectors to obtain the special equipment category related to the public sentiment.
2. The special equipment public opinion classification method according to claim 1 is characterized in that:
when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed.
3. The special equipment public opinion classification method according to claim 1 is characterized in that:
the public opinion text is divided by dividing the checked public opinion text into words and filtering word stop words to obtain a plurality of public opinion data word lists,
when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted;
and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.
4. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps:
during vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the sequence of each word is not considered, and only the frequency v of each word is showniMaking statistics to form a feature directionQuantity V ═ V1,v2,...,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,...,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,...,c9Then the public sentiment data set can be expressed as:
5. the special equipment public opinion classification method as claimed in claim 4, is characterized in that: in the classification prediction, firstly, the method is based on
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
And then selecting the maximum posterior probability as the type of the special equipment.
6. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.
7. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: when the word segmentation is carried out, sentence division, word division capable of being used as words and long word division are adopted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110030059.5A CN112699674A (en) | 2021-01-11 | 2021-01-11 | Public opinion classification method for special equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110030059.5A CN112699674A (en) | 2021-01-11 | 2021-01-11 | Public opinion classification method for special equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112699674A true CN112699674A (en) | 2021-04-23 |
Family
ID=75513726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110030059.5A Pending CN112699674A (en) | 2021-01-11 | 2021-01-11 | Public opinion classification method for special equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112699674A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107045524A (en) * | 2016-12-30 | 2017-08-15 | 中央民族大学 | A kind of method and system of network text public sentiment classification |
CN107943941A (en) * | 2017-11-23 | 2018-04-20 | 珠海金山网络游戏科技有限公司 | It is a kind of can iteration renewal rubbish text recognition methods and system |
CN109800305A (en) * | 2018-12-31 | 2019-05-24 | 南京理工大学 | Based on the microblogging mood classification method marked naturally |
US20190188260A1 (en) * | 2017-12-14 | 2019-06-20 | Qualtrics, Llc | Capturing rich response relationships with small-data neural networks |
CN112131877A (en) * | 2020-09-21 | 2020-12-25 | 民生科技有限责任公司 | Real-time Chinese text word segmentation method under mass data |
-
2021
- 2021-01-11 CN CN202110030059.5A patent/CN112699674A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107045524A (en) * | 2016-12-30 | 2017-08-15 | 中央民族大学 | A kind of method and system of network text public sentiment classification |
CN107943941A (en) * | 2017-11-23 | 2018-04-20 | 珠海金山网络游戏科技有限公司 | It is a kind of can iteration renewal rubbish text recognition methods and system |
US20190188260A1 (en) * | 2017-12-14 | 2019-06-20 | Qualtrics, Llc | Capturing rich response relationships with small-data neural networks |
CN109800305A (en) * | 2018-12-31 | 2019-05-24 | 南京理工大学 | Based on the microblogging mood classification method marked naturally |
CN112131877A (en) * | 2020-09-21 | 2020-12-25 | 民生科技有限责任公司 | Real-time Chinese text word segmentation method under mass data |
Non-Patent Citations (1)
Title |
---|
李俊峰: "特种设备事故及故障事件舆情监测分析系统的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107045524B (en) | Method and system for classifying network text public sentiments | |
CN111831824B (en) | Public opinion positive and negative surface classification method | |
CN109766544B (en) | Document keyword extraction method and device based on LDA and word vector | |
CN105138570B (en) | The doubtful crime degree calculation method of network speech data | |
Hua et al. | Extraction and analysis of risk factors from Chinese railway accident reports | |
CN111400499A (en) | Training method of document classification model, document classification method, device and equipment | |
CN111767398A (en) | Secondary equipment fault short text data classification method based on convolutional neural network | |
CN110287298A (en) | A kind of automatic question answering answer selection method based on question sentence theme | |
Wu et al. | Bnu-hkbu uic nlp team 2 at semeval-2019 task 6: Detecting offensive language using bert model | |
CN112884179A (en) | Urban rail turn-back fault diagnosis method based on machine fault and text topic analysis | |
CN117235243A (en) | Training optimization method for large language model of civil airport and comprehensive service platform | |
Adilah et al. | Sentiment analysis of online transportation service using the naïve bayes methods | |
Tiwari et al. | Comparative Analysis of Different Machine Learning Methods for Hate Speech Recognition in Twitter Text Data | |
CN114764463A (en) | Internet public opinion event automatic early warning system based on event propagation characteristics | |
CN112699674A (en) | Public opinion classification method for special equipment | |
CN111160756A (en) | Scenic spot assessment method and model based on secondary artificial intelligence algorithm | |
CN113537802A (en) | Open source information-based geopolitical risk deduction method | |
Abdullah et al. | Text mining based sentiment analysis using a novel deep learning approach | |
Demirci et al. | A Fuzzy Rule-Based Ship Risk Profile Prediction Model for Port State Control Inspections | |
Zhang et al. | Semantic sentiment analysis based on a combination of cnn and lstm model | |
Shalinda et al. | Hate words detection among sri lankan social media text messages | |
Seedah et al. | Information extraction for freight-related natural language queries | |
Shah et al. | Detecting and Unmasking AI-Generated Texts through Explainable Artificial Intelligence using Stylistic Features | |
CN113592338B (en) | Food quality management safety risk pre-screening model | |
Atmadja et al. | Classification of article knowledge field using naive bayes classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210423 |