CN112699674A - Public opinion classification method for special equipment - Google Patents

Public opinion classification method for special equipment Download PDF

Info

Publication number
CN112699674A
CN112699674A CN202110030059.5A CN202110030059A CN112699674A CN 112699674 A CN112699674 A CN 112699674A CN 202110030059 A CN202110030059 A CN 202110030059A CN 112699674 A CN112699674 A CN 112699674A
Authority
CN
China
Prior art keywords
public opinion
word
special equipment
text
public
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110030059.5A
Other languages
Chinese (zh)
Inventor
陈树芳
李娟�
刘丽梅
薛庆
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group
Original Assignee
Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group filed Critical Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group
Priority to CN202110030059.5A priority Critical patent/CN112699674A/en
Publication of CN112699674A publication Critical patent/CN112699674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a special equipment public opinion classification method, which comprises the following steps: the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors; carrying out classified prediction on the word vectors to obtain the classes of special equipment related to public sentiments; when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The public opinion text is split, the verified public opinion text is subjected to word segmentation and word filtering stop words to obtain a plurality of public opinion data word lists, the scheme realizes the analysis and the processing of the public opinion data of the special equipment, meets the requirement of the public opinion information classification of the special equipment and is beneficial to the efficient management of the public opinion of the special equipment.

Description

Public opinion classification method for special equipment
Technical Field
The invention relates to the field of special equipment, in particular to a special equipment public opinion classification method applied to the aspect of equipment management, which is beneficial to emergency disposal of characteristic equipment public opinions.
Background
The special equipment refers to boilers, pressure vessels (containing gas cylinders), pressure pipelines, elevators, hoisting machinery, passenger ropeways, large-scale amusement facilities and special motor vehicles (1) in yards (factories) which have great danger to personal and property safety. The emergency handling capacity of the special equipment is an important guarantee for properly handling the work of emergency safety events, accident emergency rescue and the like of the special equipment. By the end of 2019, the total amount of special equipment in China reaches about 1525.47 ten thousands, and the construction of accelerating the emergency handling capacity of the special equipment is urgent.
Public sentiment is the sum of various emotions, will, attitudes and opinions held by various public matters concerned by or closely related to the interests of oneself, across a certain historical period and social space. The collection and report work of the accident public opinion information of the special equipment is the basis for the emergency disposal work of the special equipment. In recent years, relevant scholars develop researches around public opinion processing and system research and application of special equipment, and play an active role in improving public opinion collecting and analyzing capacity of the special equipment. However, the classification of the special equipment and the equipment type information in the public opinion information is not standard, and the classification is often performed manually, so that the public opinion data processing efficiency is greatly restricted.
Disclosure of Invention
The invention aims to provide a special equipment public opinion classification method, which realizes the analysis and processing of special equipment public opinion data, meets the requirement on special equipment public opinion information classification and is beneficial to the efficient management of special equipment public opinions.
In order to achieve the purpose, the invention provides the following technical scheme: a public opinion classification method for special equipment comprises the following steps: firstly, public opinion texts are obtained, and the public opinion texts are verified, split and vectorized to be converted into word vectors; and then classifying and predicting the word vectors to obtain the classes of special equipment related to the public sentiment. The special equipment category is determined, and public opinion management is facilitated.
Preferably, when the public opinion text is checked, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The accuracy of original data of public opinion texts is ensured.
Preferably, the public opinion text splitting is to obtain a plurality of public opinion data word lists by carrying out word segmentation and word filtering stop words on the verified public opinion text; and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.
When the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.
During vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the appearance sequence of each word is not considered, and only the appearance frequency v of each word is changediMaking statistics to form a feature vector V ═ V1,v2,…,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,…,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,…,c9Then the public sentiment data set can be expressed as:
Figure BDA0002891749670000021
k=1,2,…,9
in the classification prediction, firstly, the method is based on
Figure BDA0002891749670000022
k=1,2,…,9;j=1,2,…,N;l=1,2,…,n;λ=1;
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
Figure BDA0002891749670000031
And then selecting the maximum posterior probability as the type of the special equipment.
When the word segmentation is carried out, sentence division, word division capable of being word division and long word division are adopted, namely three word segmentation modes are adopted: (1) the accurate mode supports the sentence division with the highest accuracy and is suitable for text analysis; (2) the full mode can scan all words which can be formed into words in a sentence, is high in speed, and is difficult to solve the ambiguity problem; (3) and the search engine mode is used for segmenting long words based on the accurate mode and is suitable for word segmentation of the search engine.
Through the description, the method of the scheme centers on the processing of original data of public sentiment texts of special equipment, the splitting of sentences and the vectorization. The data quality check mainly checks whether the public sentiment text has a missing value and an abnormal value, and performs data supplement or elimination. The sentence splitting is mainly realized by word segmentation and word filtering stop, the Chinese word segmentation algorithm can be divided into word segmentation based on rules, word segmentation based on statistics and two types of combined word segmentation, and common model libraries comprise jieba, Ansj, ancient word segmentation and the like. The filtering stop words have a function similar to a filter, mainly realize noise filtering in text data, are generally realized by stopping a lexicon, and need to select a proper stop lexicon by combining application fields, such as a stop lexicon table in Hayada, a stop lexicon in a Sichuan university machine intelligent laboratory, and the like. Text vectorization realizes that characters or words are converted into Word vectors, and common methods include One-Hot encoding, a Word bag method, Word2Vec and the like. In the public opinion preprocessing link of special equipment, keyword extraction can be realized by TF-IDF, TextRank and other methods, and text features can be further extracted conveniently; when classification prediction is carried out, the maximum posterior probability is adopted, so that public opinion data analysis of special equipment by applying an artificial neural network algorithm is possible.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 shows a cloud of special equipment and sentiment words.
FIG. 3 shows a confusion matrix for public sentiment classification prediction of special equipment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the attached drawings, the invention provides a method for classifying public sentiments of special equipment,
the method comprises the steps of firstly obtaining a public opinion text for verification, judging whether the public opinion text has a missing value and an abnormal value when the public opinion text is verified, and supplementing or removing public opinion text data.
The public opinion text is split, namely, the verified public opinion text is subjected to word segmentation and word stop word filtering to obtain a plurality of public opinion data word lists, when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a dynamic planning is adopted to search a maximum probability path, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment. Such as a word list for use in Hardsda, a word library for use in Sichuan university machine intelligence laboratories, etc.
When vectorizing, the public opinion data word list after the word segmentation and the filtering word are stopped is not considered to appear each wordOnly the frequency v of occurrence of each wordiMaking statistics to form a feature vector V ═ V1,v2,…,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,…,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,…,c9Then the public sentiment data set can be expressed as:
Figure BDA0002891749670000041
k=1,2,…,9
in the classification prediction, firstly, the method is based on
Figure BDA0002891749670000051
k=1,2,…,9;j=1,2,…,N;l=1,2,…,n;λ=1;
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
Figure BDA0002891749670000052
And then selecting the maximum posterior probability as the type of the special equipment.
If a period of time in the public opinion monitoring system of the special equipment is selected, total 6984 pieces of public opinion data of the special equipment, including public opinion sources, occurrence time, public opinion titles, public opinion contents and the like, are processed by applying Python language, after data items with null values are removed, the class of the special equipment to which the sample data belongs is labeled by using a manual labeling method, meanwhile, modeling analysis is performed by using a machine learning method conveniently, digital labels are set for various classes of the special equipment, and if 6983 pieces of effective sample data are obtained, the table 1 shows.
TABLE 1 public opinion data distribution for special equipment types
Figure BDA0002891749670000053
Obtaining a plurality of public opinion data word lists by carrying out word segmentation and word stop word filtering on verified public opinion texts
TABLE 2 public opinion data of special equipment
Figure BDA0002891749670000061
And applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying. The word cloud can visually display words frequently appearing in the text in an image mode, as shown in fig. 2, it can be seen that elevators, trapped persons, accidents, gas tanks, leakage, explosion and the like frequently appear, the appearance times are shown in table 3, and therefore, public sentiments of special equipment have obvious text characteristics, and then text vectorization is carried out.
TABLE 3 partial word list and word frequency of special equipment public sentiment
Figure BDA0002891749670000062
And (3) applying a cross validation method, randomly dividing the public opinion sample data of the special equipment into a training set and a testing set according to the proportion of 75% to 25% by using a train _ test _ split method, further obtaining the public opinion text feature vectors of the training set and the testing set by using a word frequency statistics method, and preparing for developing modeling analysis.
Modeling is carried out on a training set to obtain the maximum posterior probability, the modeling effect is evaluated through a test set, the obtained confusion matrix is shown in a figure 3, the evaluation result is shown in a table 4, wherein in the figure 3, a boiler (label 1), a pressure container (label 2), a pressure pipeline (label 3), an elevator (label 4), a hoisting machine (label 5), a passenger transport cableway (label 6), a large-scale amusement facility (label 7), a special motor vehicle (label 8) in a field (factory), the right side in the figure 3 shows public opinion quantity, the bottom in the figure 3 shows special equipment category, the left side in the figure 3 shows the special equipment category, and the special motor vehicle number in the field (factory) is less, so that the special equipment category is not listed on the left side in the figure; evaluation indexes were performed for accuracy (Precision), Recall (Recall), and overall evaluation index (f1-score) [17], and are defined as follows:
accuracy (P, Precision) is the number of correct predictions/total number of test sets by public sentiment classification
Recall (R, Recall) the correct number of public opinion categories/total number of special equipment of the type in the test set
Overall evaluation index (F1-score) ═ 2PR/(P + R)
Table 4 public opinion classification model evaluation result table for special equipment
Figure BDA0002891749670000071
As can be seen from the table 4, the overall prediction accuracy of the model reaches 95%, wherein the prediction accuracy, the recall rate and the comprehensive evaluation of the pressure container (label 2) and the elevator (label 4) reach more than 90%, the prediction accuracy of the pressure pipeline (label 3), the hoisting machinery (label 5) and the large-scale amusement facility (label 7) reaches more than 80%, and the overall prediction result of the model is better. The total number of public opinion samples of three special equipment, namely passenger transport cableways (label 6), special motor vehicles (label 8) in factories (label 1) and boilers (label 1) is within 50, the prediction effect is not ideal enough, but with the accumulation of public opinion texts, when the number of public opinions reaches more than 90, the prediction accuracy can reach 80%, the recall rate reaches 60%, and the comprehensive evaluation reaches more than 70%, so that the method has a good application prospect.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A public opinion classification method for special equipment is characterized by comprising the following steps:
the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors;
and classifying and predicting the word vectors to obtain the special equipment category related to the public sentiment.
2. The special equipment public opinion classification method according to claim 1 is characterized in that:
when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed.
3. The special equipment public opinion classification method according to claim 1 is characterized in that:
the public opinion text is divided by dividing the checked public opinion text into words and filtering word stop words to obtain a plurality of public opinion data word lists,
when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted;
and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.
4. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps:
during vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the sequence of each word is not considered, and only the frequency v of each word is showniMaking statistics to form a feature directionQuantity V ═ V1,v2,...,vnAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.
Gathering a feature vector V of each public opinion text as an n-dimensional input space:
X={V1,V2,...,VN}
wherein N is the number of public opinion sample data;
setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C1,c2,...,c9Then the public sentiment data set can be expressed as:
Figure FDA0002891749660000021
5. the special equipment public opinion classification method as claimed in claim 4, is characterized in that: in the classification prediction, firstly, the method is based on
Figure FDA0002891749660000022
Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula
Figure FDA0002891749660000023
And then selecting the maximum posterior probability as the type of the special equipment.
6. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.
7. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: when the word segmentation is carried out, sentence division, word division capable of being used as words and long word division are adopted.
CN202110030059.5A 2021-01-11 2021-01-11 Public opinion classification method for special equipment Pending CN112699674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110030059.5A CN112699674A (en) 2021-01-11 2021-01-11 Public opinion classification method for special equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110030059.5A CN112699674A (en) 2021-01-11 2021-01-11 Public opinion classification method for special equipment

Publications (1)

Publication Number Publication Date
CN112699674A true CN112699674A (en) 2021-04-23

Family

ID=75513726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110030059.5A Pending CN112699674A (en) 2021-01-11 2021-01-11 Public opinion classification method for special equipment

Country Status (1)

Country Link
CN (1) CN112699674A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045524A (en) * 2016-12-30 2017-08-15 中央民族大学 A kind of method and system of network text public sentiment classification
CN107943941A (en) * 2017-11-23 2018-04-20 珠海金山网络游戏科技有限公司 It is a kind of can iteration renewal rubbish text recognition methods and system
CN109800305A (en) * 2018-12-31 2019-05-24 南京理工大学 Based on the microblogging mood classification method marked naturally
US20190188260A1 (en) * 2017-12-14 2019-06-20 Qualtrics, Llc Capturing rich response relationships with small-data neural networks
CN112131877A (en) * 2020-09-21 2020-12-25 民生科技有限责任公司 Real-time Chinese text word segmentation method under mass data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107045524A (en) * 2016-12-30 2017-08-15 中央民族大学 A kind of method and system of network text public sentiment classification
CN107943941A (en) * 2017-11-23 2018-04-20 珠海金山网络游戏科技有限公司 It is a kind of can iteration renewal rubbish text recognition methods and system
US20190188260A1 (en) * 2017-12-14 2019-06-20 Qualtrics, Llc Capturing rich response relationships with small-data neural networks
CN109800305A (en) * 2018-12-31 2019-05-24 南京理工大学 Based on the microblogging mood classification method marked naturally
CN112131877A (en) * 2020-09-21 2020-12-25 民生科技有限责任公司 Real-time Chinese text word segmentation method under mass data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李俊峰: "特种设备事故及故障事件舆情监测分析系统的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN107045524B (en) Method and system for classifying network text public sentiments
CN111831824B (en) Public opinion positive and negative surface classification method
CN109766544B (en) Document keyword extraction method and device based on LDA and word vector
CN105138570B (en) The doubtful crime degree calculation method of network speech data
Hua et al. Extraction and analysis of risk factors from Chinese railway accident reports
CN111400499A (en) Training method of document classification model, document classification method, device and equipment
CN111767398A (en) Secondary equipment fault short text data classification method based on convolutional neural network
CN110287298A (en) A kind of automatic question answering answer selection method based on question sentence theme
Wu et al. Bnu-hkbu uic nlp team 2 at semeval-2019 task 6: Detecting offensive language using bert model
CN112884179A (en) Urban rail turn-back fault diagnosis method based on machine fault and text topic analysis
CN117235243A (en) Training optimization method for large language model of civil airport and comprehensive service platform
Adilah et al. Sentiment analysis of online transportation service using the naïve bayes methods
Tiwari et al. Comparative Analysis of Different Machine Learning Methods for Hate Speech Recognition in Twitter Text Data
CN114764463A (en) Internet public opinion event automatic early warning system based on event propagation characteristics
CN112699674A (en) Public opinion classification method for special equipment
CN111160756A (en) Scenic spot assessment method and model based on secondary artificial intelligence algorithm
CN113537802A (en) Open source information-based geopolitical risk deduction method
Abdullah et al. Text mining based sentiment analysis using a novel deep learning approach
Demirci et al. A Fuzzy Rule-Based Ship Risk Profile Prediction Model for Port State Control Inspections
Zhang et al. Semantic sentiment analysis based on a combination of cnn and lstm model
Shalinda et al. Hate words detection among sri lankan social media text messages
Seedah et al. Information extraction for freight-related natural language queries
Shah et al. Detecting and Unmasking AI-Generated Texts through Explainable Artificial Intelligence using Stylistic Features
CN113592338B (en) Food quality management safety risk pre-screening model
Atmadja et al. Classification of article knowledge field using naive bayes classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210423