CN112699674A

CN112699674A - Public opinion classification method for special equipment

Info

Publication number: CN112699674A
Application number: CN202110030059.5A
Authority: CN
Inventors: 陈树芳; 李娟�; 刘丽梅; 薛庆; 李磊
Original assignee: Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group
Current assignee: Lu An Engineering Technology Service Co Ltd Of Shandong Special Equipment Inspection And Testing Group
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-04-23

Abstract

The invention relates to a special equipment public opinion classification method, which comprises the following steps: the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors; carrying out classified prediction on the word vectors to obtain the classes of special equipment related to public sentiments; when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The public opinion text is split, the verified public opinion text is subjected to word segmentation and word filtering stop words to obtain a plurality of public opinion data word lists, the scheme realizes the analysis and the processing of the public opinion data of the special equipment, meets the requirement of the public opinion information classification of the special equipment and is beneficial to the efficient management of the public opinion of the special equipment.

Description

Public opinion classification method for special equipment

Technical Field

The invention relates to the field of special equipment, in particular to a special equipment public opinion classification method applied to the aspect of equipment management, which is beneficial to emergency disposal of characteristic equipment public opinions.

Background

The special equipment refers to boilers, pressure vessels (containing gas cylinders), pressure pipelines, elevators, hoisting machinery, passenger ropeways, large-scale amusement facilities and special motor vehicles (1) in yards (factories) which have great danger to personal and property safety. The emergency handling capacity of the special equipment is an important guarantee for properly handling the work of emergency safety events, accident emergency rescue and the like of the special equipment. By the end of 2019, the total amount of special equipment in China reaches about 1525.47 ten thousands, and the construction of accelerating the emergency handling capacity of the special equipment is urgent.

Public sentiment is the sum of various emotions, will, attitudes and opinions held by various public matters concerned by or closely related to the interests of oneself, across a certain historical period and social space. The collection and report work of the accident public opinion information of the special equipment is the basis for the emergency disposal work of the special equipment. In recent years, relevant scholars develop researches around public opinion processing and system research and application of special equipment, and play an active role in improving public opinion collecting and analyzing capacity of the special equipment. However, the classification of the special equipment and the equipment type information in the public opinion information is not standard, and the classification is often performed manually, so that the public opinion data processing efficiency is greatly restricted.

Disclosure of Invention

The invention aims to provide a special equipment public opinion classification method, which realizes the analysis and processing of special equipment public opinion data, meets the requirement on special equipment public opinion information classification and is beneficial to the efficient management of special equipment public opinions.

In order to achieve the purpose, the invention provides the following technical scheme: a public opinion classification method for special equipment comprises the following steps: firstly, public opinion texts are obtained, and the public opinion texts are verified, split and vectorized to be converted into word vectors; and then classifying and predicting the word vectors to obtain the classes of special equipment related to the public sentiment. The special equipment category is determined, and public opinion management is facilitated.

Preferably, when the public opinion text is checked, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed. The accuracy of original data of public opinion texts is ensured.

Preferably, the public opinion text splitting is to obtain a plurality of public opinion data word lists by carrying out word segmentation and word filtering stop words on the verified public opinion text; and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.

When the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.

During vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the appearance sequence of each word is not considered, and only the appearance frequency v of each word is changed_iMaking statistics to form a feature vector V ═ V₁，v₂，…，v_nAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.

Gathering a feature vector V of each public opinion text as an n-dimensional input space:

X＝{V₁，V₂，…，V_N}

wherein N is the number of public opinion sample data;

setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C₁，c₂，…，c₉Then the public sentiment data set can be expressed as:

k＝1,2,…,9

in the classification prediction, firstly, the method is based on

k＝1,2,…,9；j＝1,2,…,N；l＝1,2,…,n；λ＝1；

Obtaining a posterior probability of each category; then obtaining the maximum posterior probability according to the following formula

And then selecting the maximum posterior probability as the type of the special equipment.

When the word segmentation is carried out, sentence division, word division capable of being word division and long word division are adopted, namely three word segmentation modes are adopted: (1) the accurate mode supports the sentence division with the highest accuracy and is suitable for text analysis; (2) the full mode can scan all words which can be formed into words in a sentence, is high in speed, and is difficult to solve the ambiguity problem; (3) and the search engine mode is used for segmenting long words based on the accurate mode and is suitable for word segmentation of the search engine.

Through the description, the method of the scheme centers on the processing of original data of public sentiment texts of special equipment, the splitting of sentences and the vectorization. The data quality check mainly checks whether the public sentiment text has a missing value and an abnormal value, and performs data supplement or elimination. The sentence splitting is mainly realized by word segmentation and word filtering stop, the Chinese word segmentation algorithm can be divided into word segmentation based on rules, word segmentation based on statistics and two types of combined word segmentation, and common model libraries comprise jieba, Ansj, ancient word segmentation and the like. The filtering stop words have a function similar to a filter, mainly realize noise filtering in text data, are generally realized by stopping a lexicon, and need to select a proper stop lexicon by combining application fields, such as a stop lexicon table in Hayada, a stop lexicon in a Sichuan university machine intelligent laboratory, and the like. Text vectorization realizes that characters or words are converted into Word vectors, and common methods include One-Hot encoding, a Word bag method, Word2Vec and the like. In the public opinion preprocessing link of special equipment, keyword extraction can be realized by TF-IDF, TextRank and other methods, and text features can be further extracted conveniently; when classification prediction is carried out, the maximum posterior probability is adopted, so that public opinion data analysis of special equipment by applying an artificial neural network algorithm is possible.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 shows a cloud of special equipment and sentiment words.

FIG. 3 shows a confusion matrix for public sentiment classification prediction of special equipment.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to the attached drawings, the invention provides a method for classifying public sentiments of special equipment,

the method comprises the steps of firstly obtaining a public opinion text for verification, judging whether the public opinion text has a missing value and an abnormal value when the public opinion text is verified, and supplementing or removing public opinion text data.

The public opinion text is split, namely, the verified public opinion text is subjected to word segmentation and word stop word filtering to obtain a plurality of public opinion data word lists, when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a dynamic planning is adopted to search a maximum probability path, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted; and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment. Such as a word list for use in Hardsda, a word library for use in Sichuan university machine intelligence laboratories, etc.

When vectorizing, the public opinion data word list after the word segmentation and the filtering word are stopped is not considered to appear each wordOnly the frequency v of occurrence of each word_iMaking statistics to form a feature vector V ═ V₁，v₂，…，v_nAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.

X＝{V₁，V₂，…，V_N}

wherein N is the number of public opinion sample data;

k＝1,2,…,9

in the classification prediction, firstly, the method is based on

k＝1,2,…,9；j＝1,2,…,N；l＝1,2,…,n；λ＝1；

If a period of time in the public opinion monitoring system of the special equipment is selected, total 6984 pieces of public opinion data of the special equipment, including public opinion sources, occurrence time, public opinion titles, public opinion contents and the like, are processed by applying Python language, after data items with null values are removed, the class of the special equipment to which the sample data belongs is labeled by using a manual labeling method, meanwhile, modeling analysis is performed by using a machine learning method conveniently, digital labels are set for various classes of the special equipment, and if 6983 pieces of effective sample data are obtained, the table 1 shows.

TABLE 1 public opinion data distribution for special equipment types

Obtaining a plurality of public opinion data word lists by carrying out word segmentation and word stop word filtering on verified public opinion texts

TABLE 2 public opinion data of special equipment

And applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying. The word cloud can visually display words frequently appearing in the text in an image mode, as shown in fig. 2, it can be seen that elevators, trapped persons, accidents, gas tanks, leakage, explosion and the like frequently appear, the appearance times are shown in table 3, and therefore, public sentiments of special equipment have obvious text characteristics, and then text vectorization is carried out.

TABLE 3 partial word list and word frequency of special equipment public sentiment

And (3) applying a cross validation method, randomly dividing the public opinion sample data of the special equipment into a training set and a testing set according to the proportion of 75% to 25% by using a train _ test _ split method, further obtaining the public opinion text feature vectors of the training set and the testing set by using a word frequency statistics method, and preparing for developing modeling analysis.

Modeling is carried out on a training set to obtain the maximum posterior probability, the modeling effect is evaluated through a test set, the obtained confusion matrix is shown in a figure 3, the evaluation result is shown in a table 4, wherein in the figure 3, a boiler (label 1), a pressure container (label 2), a pressure pipeline (label 3), an elevator (label 4), a hoisting machine (label 5), a passenger transport cableway (label 6), a large-scale amusement facility (label 7), a special motor vehicle (label 8) in a field (factory), the right side in the figure 3 shows public opinion quantity, the bottom in the figure 3 shows special equipment category, the left side in the figure 3 shows the special equipment category, and the special motor vehicle number in the field (factory) is less, so that the special equipment category is not listed on the left side in the figure; evaluation indexes were performed for accuracy (Precision), Recall (Recall), and overall evaluation index (f1-score) [17], and are defined as follows:

accuracy (P, Precision) is the number of correct predictions/total number of test sets by public sentiment classification

Recall (R, Recall) the correct number of public opinion categories/total number of special equipment of the type in the test set

Overall evaluation index (F1-score) ═ 2PR/(P + R)

Table 4 public opinion classification model evaluation result table for special equipment

As can be seen from the table 4, the overall prediction accuracy of the model reaches 95%, wherein the prediction accuracy, the recall rate and the comprehensive evaluation of the pressure container (label 2) and the elevator (label 4) reach more than 90%, the prediction accuracy of the pressure pipeline (label 3), the hoisting machinery (label 5) and the large-scale amusement facility (label 7) reaches more than 80%, and the overall prediction result of the model is better. The total number of public opinion samples of three special equipment, namely passenger transport cableways (label 6), special motor vehicles (label 8) in factories (label 1) and boilers (label 1) is within 50, the prediction effect is not ideal enough, but with the accumulation of public opinion texts, when the number of public opinions reaches more than 90, the prediction accuracy can reach 80%, the recall rate reaches 60%, and the comprehensive evaluation reaches more than 70%, so that the method has a good application prospect.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A public opinion classification method for special equipment is characterized by comprising the following steps:

the method comprises the steps of obtaining a public opinion text, and carrying out verification, splitting and vectorization on the public opinion text to convert the public opinion text into word vectors;

and classifying and predicting the word vectors to obtain the special equipment category related to the public sentiment.

2. The special equipment public opinion classification method according to claim 1 is characterized in that:

when the public opinion text is verified, whether the public opinion text has a missing value and an abnormal value is judged, and public opinion text data is supplemented or removed.

3. The special equipment public opinion classification method according to claim 1 is characterized in that:

the public opinion text is divided by dividing the checked public opinion text into words and filtering word stop words to obtain a plurality of public opinion data word lists,

when the public opinion text is segmented, word graph scanning is realized based on a prefix dictionary, all possible word forming conditions of Chinese characters in a sentence are generated, a directed acyclic graph is further generated, a maximum probability path is searched by adopting dynamic planning, a maximum segmentation combination based on word frequency is found, and for unknown words, a hidden Markov model based on the word forming capability of the Chinese characters is adopted;

and the filter word deactivation is used for realizing noise filtration in the text data, realized by deactivating a word bank, and selecting a proper deactivation word bank by combining with the application field of the special equipment.

4. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps:

during vectorization, aiming at the public opinion data word list after word segmentation and word filtering are stopped, the sequence of each word is not considered, and only the frequency v of each word is shown_iMaking statistics to form a feature directionQuantity V ═ V₁，v₂，...，v_nAnd as a public sentiment text feature, wherein n is a public sentiment data word table dimension.

X＝{V₁，V₂，...，V_N}

wherein N is the number of public opinion sample data;

setting 8 special equipment classifications and 1 other classification to be 9 classes in total, and expressing the classification space as C ═ C₁，c₂，...，c₉Then the public sentiment data set can be expressed as:

5. the special equipment public opinion classification method as claimed in claim 4, is characterized in that: in the classification prediction, firstly, the method is based on

6. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: and applying a WordCloud library to the obtained public opinion data table to generate a word cloud for displaying.

7. The special equipment public opinion classification method as claimed in claim 3, wherein the method comprises the following steps: when the word segmentation is carried out, sentence division, word division capable of being used as words and long word division are adopted.