CN112749280A - Internet public opinion classification method, device, electronic device and storage medium - Google Patents

Internet public opinion classification method, device, electronic device and storage medium Download PDF

Info

Publication number
CN112749280A
CN112749280A CN202110061679.5A CN202110061679A CN112749280A CN 112749280 A CN112749280 A CN 112749280A CN 202110061679 A CN202110061679 A CN 202110061679A CN 112749280 A CN112749280 A CN 112749280A
Authority
CN
China
Prior art keywords
public opinion
text information
opinion text
public
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110061679.5A
Other languages
Chinese (zh)
Inventor
沈嘉怡
范渊
杨勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN202110061679.5A priority Critical patent/CN112749280A/en
Publication of CN112749280A publication Critical patent/CN112749280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application relates to a method and a device for classifying network public opinions, an electronic device and a storage medium. The classification method of the network public sentiment comprises the following steps: acquiring first public opinion text information in a first website to be analyzed; inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information; and classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category. Through the method and the device, the problem of low risk judgment rate of the network public sentiment in the related technology is solved, and the risk judgment rate of the network public sentiment is improved.

Description

Internet public opinion classification method, device, electronic device and storage medium
Technical Field
The present application relates to the field of network security, and in particular, to a method, an apparatus, an electronic apparatus, and a storage medium for classifying network public opinions.
Background
In recent years, the influence of internet public opinion on the field of network security is increasing, and users begin to recognize the importance of internet public opinion along with some internet public opinion events. Meanwhile, if the network public opinion emergency is not handled properly, some economic losses are possibly brought to users, and the threat to the network security is caused.
In the related technology, most public opinion monitoring needs to spend a large amount of technical labor cost for classification and maintenance, and the classification of positive and negative faces of public opinion text information is not clear enough, so that the analysis of network public opinions is not in place, and the public opinions of risk classes cannot be obtained in time.
At present, no effective solution is provided for the problem of low network security caused by low risk judgment rate of network public sentiment in the related technology.
Disclosure of Invention
The embodiment of the application provides a classification method, a device, an electronic device and a storage medium of network public sentiment, so as to at least solve the problem of low risk judgment rate of the network public sentiment in the related technology.
In a first aspect, an embodiment of the application provides a method for classifying network public opinions, which includes:
acquiring first public opinion text information in a first website to be analyzed;
inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information;
and classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
In some embodiments, before inputting the first public opinion text information into a well-trained public opinion text information classification model and obtaining a public opinion category corresponding to the first public opinion text information, the method further includes:
preprocessing the first public opinion text information, wherein the preprocessing comprises: information filtering, information complementing and information duplication removing.
In some embodiments, after classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category, the method further comprises:
acquiring second public opinion text information in a second website to be analyzed;
judging whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value;
and under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be greater than a preset value, classifying the second public opinion text information into the first target public opinion category database.
In some embodiments, after determining whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value, the method further includes:
under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be not more than a preset value, inputting the second public opinion text information into a public opinion text information classification model which is completely trained to obtain a second target public opinion category corresponding to the second public opinion text information;
and classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category.
In some embodiments, after classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category, the method further includes:
acquiring third public opinion text information in a third website to be analyzed;
determining a first distance value of a first target public opinion category database of third public opinion text information and a second distance value of a second target public opinion category database of fourth public opinion text information based on a K-nearest neighbor classification algorithm;
and under the condition that the first distance value is smaller than the second distance value, classifying the third public opinion text information into the first target public opinion category database.
In some embodiments, in the case that the first distance value is greater than the second distance value, the third public opinion text information is classified into the second target public opinion category database.
In some embodiments, the training process for training the complete public opinion text information classification model includes:
acquiring a plurality of public opinion text information samples and an initial public opinion text information classification model;
selecting a preset threshold value of public opinion text information samples from a plurality of public opinion text information samples as an initial clustering center;
determining the minimum distance between a plurality of public opinion text information samples and the initial clustering centers with the preset threshold value;
and training the initial public opinion text information classification model according to the minimum distance until convergence, so as to obtain a completely trained public opinion text information classification model.
In a second aspect, an embodiment of the present application further provides a device for classifying network public opinions, including:
the first acquisition module is used for acquiring first public opinion text information in a first website to be analyzed;
the first input module is used for inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information;
and the first classification module is used for classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the method for classifying internet public opinions as described in the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for classifying internet public opinions as described in the first aspect.
Compared with the related art, the method, the device, the electronic device and the storage medium for classifying the network public sentiments provided by the embodiment of the application acquire the first public sentiment text information in the first website to be analyzed; inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information; according to the first target public opinion category, the first public opinion text information is classified into the first target public opinion category database corresponding to the first target public opinion category, so that the problem of low risk judgment rate of online public opinions in the related technology is solved, and the risk judgment rate of the online public opinions is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a terminal of a classification method of internet public sentiment according to an embodiment of the present application;
fig. 2 is a flowchart of a method for classifying internet public opinions according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a classification apparatus for internet public sentiment according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking an example of a terminal, fig. 1 is a block diagram of a hardware structure of a terminal according to a classification method of internet public sentiment of an embodiment of the present application. As shown in fig. 1, the terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the classification method of internet opinions in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The embodiment provides a method for classifying internet public sentiments, and fig. 2 is a flowchart of a method for classifying internet public sentiments according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, obtaining first public sentiment text information in a first website to be analyzed.
In this step, the first public opinion text information in the first website to be analyzed may be obtained in real time through web page recognition, or may be obtained from a database in which the first public opinion text information in the first website to be analyzed is stored.
It should be noted that the mode of identifying the web page may be the following mode, and first, the HTML website content of the web page is obtained through the web page identification, where the website content may be generally divided into three types: text words, pictures, and videos. And for the picture, an SIFT algorithm can be used for identifying characters in the picture, extracting the characters in the picture and carrying out weighting processing on the characters. For text introduction in a video, words in the text introduction are extracted through MFCC, weight proportion in the words is increased, full text content related to a webpage can be obtained through algorithm integration, and finally, final public opinion text information is obtained through word segmentation and weight increase of the texts under different types and parameter adjustment. In some embodiments, after the final public opinion text information is obtained, the public opinion text information can be imported into a database to form an unclassified public opinion database, so that the user can directly classify the public opinion text information in the database.
Hypertext Markup Language (HTML) is a Markup Language. The document format on the network can be unified through the labels, so that the scattered Internet resources are connected into a logic whole. HTML text is descriptive text consisting of HTML commands that can specify words, graphics, animations, sounds, tables, links, etc.
Scale-invariant feature transform (SIFT) is an algorithm for computer vision.
Mel-frequency cepstral coefficients (MFCC) are mainly used for feature extraction and operation dimension reduction of voice data.
Further, the acquired first public opinion text information may be a set of a plurality of public opinion text information.
Step S202, inputting first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information.
In the step, the completely trained public opinion text information classification model is based on a k-means clustering algorithm, a model with higher classification judgment rate of the public opinion text information is obtained after training for a certain number of times, and the public opinion text information is classified through the model, so that the accuracy of public opinion text information classification can be improved.
Step S203, classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
In this step, a plurality of databases of different categories may be preset, and the database of each category may correspond to one public opinion category, so as to store the public opinion text information of different categories in a classified manner.
Through the steps S201 to S203, the first public opinion text information is classified by training the complete public opinion text information classification model, and the completely trained public opinion text information classification model is based on the k-means clustering algorithm, and a model with a higher public opinion text information classification determination rate is obtained after training for a certain number of times, so that the classification accuracy of the public opinion text information is improved, the problem of low online public opinion risk determination rate in the related technology is solved, and the online public opinion risk determination rate is improved, so as to complete the online public opinion risk assessment.
In some embodiments, before inputting the first public opinion text information into a fully trained public opinion text information classification model and obtaining a public opinion category corresponding to the first public opinion text information, the method may further include the following steps: preprocessing the first public opinion text information, wherein the preprocessing comprises: information filtering, information complementing and information duplication removing.
In this embodiment, the quality of the first public opinion text information can be improved by performing information filtering, information complementing and information duplication removing on the first public opinion text information, so as to improve the classification accuracy of the completely trained public opinion text information classification model.
In some embodiments, the training process for training the complete public opinion text information classification model may include the following steps:
step 1, obtaining a plurality of public opinion text information samples and an initial public opinion text information classification model.
In this step, in order to facilitate training of a network model with an accuracy meeting the requirement, the training device may create an initialized network model, where the model parameters in the network model are initialized parameter values and may be randomly determined, and subsequently train the network model according to sample data, and may adjust the model parameters in the network model.
The network model may be a convolutional neural network model, a deep learning network model, a cyclic neural network model, an LSTM (Long Short-Term Memory) model, or the like. In order to reduce the time for model training and reduce the requirements on training equipment, the initialized network model can adopt a light-weight deep network model.
The plurality of public opinion text information samples in this step may be labeled in advance, different categories corresponding to different labels, and the public opinion text information of the same category may be a cluster.
And 2, selecting a plurality of public opinion text information samples with preset threshold values from the plurality of public opinion text information samples as initial clustering centers.
In this step, the initial cluster center may be a preset threshold randomly selected, or may be set according to a rule set by a user.
And 3, determining the minimum distance between the plurality of public opinion text information samples and the initial clustering centers with the preset threshold value.
In this step, the distance between each initial clustering center and each public opinion text information sample may be calculated, and then the minimum distance may be selected from the plurality of distances.
And 4, training the initial public opinion text information classification model according to the minimum distance until convergence, and obtaining a completely trained public opinion text information classification model.
In this step, after training the initial public opinion text information classification model according to the minimum distance, it is further required to determine whether the training parameter is converged, if not, the above steps 2 and 3 are repeated continuously until convergence, thereby obtaining a model with a high classification discrimination rate of the public opinion text information.
In this embodiment, through the above steps, the training of the public opinion text information classification model is realized, a training mode of the public opinion text information classification model is provided, and meanwhile, the classification accuracy of the public opinion text information classification model is improved.
Further, the training mode in this embodiment can be described and illustrated by the following embodiments:
on the basis of obtaining a plurality of public opinion text information samples and an initial public opinion text information classification model, clustering can be performed in an unsupervised learning (unsupervised learning) mode. By setting different labels label for a plurality of public opinion text messages and marking the labels as 'clusters' of different behaviors, the most common k-means clustering algorithm can be adopted in the embodiment. The k-means algorithm is an indirect clustering method based on similarity measurement among samples, belongs to an unsupervised learning method, and takes k as a parameter, and divides a plurality of public opinion text information into k clusters, so that the clusters have higher similarity and the similarity among the clusters is lower. The similarity is calculated based on the average of the objects in a cluster (seen as the center of the cluster). The clustering algorithm first randomly selects k objects, each object representing the centroid of a cluster. For each of the remaining objects, it is assigned to the cluster that is most similar to the object based on the distance between the object and the cluster centroids. Then, a new centroid for each cluster is calculated. And repeating the processes until the criterion function converges to obtain a model with higher classification discrimination rate of the public opinion text information. Therefore, the public opinion text information classification module with complete training can classify a plurality of public opinion text information to obtain various different types of public opinion text information.
It should be noted that: the basic steps of the k-means algorithm are as follows:
step 1, randomly selecting k public opinion text messages from n public opinion text messages as initial clustering centers;
step 2, calculating the distance between each public opinion text information and each initial clustering center according to each initial clustering center; and dividing the corresponding object again according to the minimum distance;
step 3, recalculating the mean value (namely a center object) of the clusters formed by the public sentiment texts of the same category after each (changed) cluster;
step 4, judging function convergence, if so, terminating the algorithm; if not, returning to the step 2.
It should be noted that, in some embodiments, the model may also be trained through other clustering algorithms capable of using the scheme in the embodiment of the present application, which is not limited in the embodiment of the present application.
In some embodiments, after classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category, the method further includes the following steps:
step 1, second public opinion text information in a second website to be analyzed is obtained.
In this step, the second public opinion text information in the second website to be analyzed may be obtained in real time through web page recognition, or may be obtained from a database in which the second public opinion text information in the second website to be analyzed is stored.
And 2, judging whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value.
And 3, classifying the second public opinion text information into a first target public opinion category database under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be greater than a preset value.
In this embodiment, based on the above steps, the classification of the second public opinion text information is directly determined according to the similarity of the public opinion text information, and when the similarity between the second public opinion text information and the first public opinion text information is determined to be greater than the preset value, the second public opinion text information is classified into the first target public opinion category database, so that the process of classifying the second public opinion text information can be simplified.
In some embodiments, after determining whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value, the method may further include the following steps:
step 1, under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be not more than a preset value, the second public opinion text information is input into a public opinion text information classification model which is completely trained, and a second target public opinion category corresponding to the second public opinion text information is obtained.
And 2, classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category.
In the embodiment, the second public opinion text information is classified by a completely trained public opinion text information classification model, and the completely trained public opinion text information classification model is based on a k-means clustering algorithm, and a model with higher public opinion text information classification judgment rate is obtained after training for a certain number of times, so that the public opinion text information classification accuracy is improved, the problem of low online public opinion risk judgment rate in related technologies is solved, and the online public opinion risk judgment rate is improved, so that online public opinion risk assessment is facilitated.
In some embodiments, after classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category, the method may further include the following steps:
step 1, obtaining third public opinion text information in a third website to be analyzed.
And 2, determining a first distance value of a first target public opinion category database of the third public opinion text information and determining a second distance value of a second target public opinion category database of the fourth public opinion text information based on a K-nearest neighbor classification algorithm.
And 3, under the condition that the first distance value is smaller than the second distance value, classifying the third public opinion text information into a first target public opinion category database.
In this embodiment, based on the first target public opinion category database and the second target public opinion category database obtained in the above embodiments, a K-nearest neighbor classification algorithm is used to determine a first distance value of the third public opinion text information from the first target public opinion category database, and determine a second distance value of the fourth public opinion text information from the second target public opinion category database, and under the condition that the first distance value is smaller than the second distance value, the third public opinion text information is classified into the first target public opinion category database, so that the classification of the third public opinion text information is realized, the public opinion text information of the first target public opinion category database is perfected, the subsequent classification of the public opinion text information is further improved, and the classification efficiency of the public opinion text information is also improved.
It should be noted that the K-nearest neighbor classification algorithm, so-called K-nearest neighbor, means K nearest neighbors, and it is said that each sample can be represented by its nearest K neighbors.
In some embodiments, in the case that the first distance value is greater than the second distance value, the third public opinion text information is classified into the second target public opinion category database.
In this embodiment, based on the first target public opinion category database and the second target public opinion category database obtained in the above embodiments, a K-nearest neighbor classification algorithm is used to determine a first distance value of the third public opinion text information from the first target public opinion category database, and determine a second distance value of the fourth public opinion text information from the second target public opinion category database, and under the condition that the first distance value is greater than the second distance value, the third public opinion text information is classified into the second target public opinion category database, so that the classification of the third public opinion text information is realized, the public opinion text information of the second target public opinion category database is perfected, the subsequent classification of the public opinion text information is further improved, and the classification efficiency of the public opinion text information is also improved.
It should be noted that, in some embodiments, the third public opinion text information may be further classified into the first target public opinion category database by determining a first similarity between the third public opinion text information and the first target public opinion text information, and a second similarity between the third public opinion text information and the second target public opinion text information, where the first similarity is greater than the second similarity and the first similarity is greater than a preset third similarity; or under the condition that the first similarity is smaller than the second similarity and the second similarity is larger than a preset third similarity, classifying the third public opinion text information into a second target public opinion category database; or, under the condition that the first similarity and the second similarity are both smaller than a preset third similarity, generating a third target public opinion category database according to the third public opinion text information from the step 202 to the step 203. Through the mode, high-precision division of a plurality of public opinion text information is conveniently realized, and the accuracy of public opinion text information classification is improved.
The embodiment also provides a device for classifying network public sentiments, which is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram illustrating a structure of an apparatus for classifying internet public sentiments according to an embodiment of the present application, and as shown in fig. 3, the apparatus includes:
the first obtaining module 31 is configured to obtain first public opinion text information in a first website to be analyzed;
a first input module 32, coupled to the first obtaining module 31, configured to input the first public opinion text information into a fully trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, where the fully trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information;
the first classification module 33, coupled to the first input module 32, is configured to classify the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
In some of these embodiments, the apparatus further comprises: the preprocessing module is used for preprocessing the first public opinion text information, wherein the preprocessing comprises the following steps: information filtering, information complementing and information duplication removing.
In some of these embodiments, the apparatus further comprises: the second acquisition module is used for acquiring a plurality of public opinion text information samples and an initial public opinion text information classification model; the selection module is used for selecting a preset threshold value of public opinion text information samples from the multiple public opinion text information samples as an initial clustering center; the first determining module is used for determining the minimum distance between a plurality of public opinion text information samples and a preset threshold value initial clustering center; and the training module is used for training the initial public opinion text information classification model according to the minimum distance until convergence, so as to obtain a completely trained public opinion text information classification model.
In some of these embodiments, the apparatus further comprises: the third acquisition module is used for acquiring second public opinion text information in a second website to be analyzed; the judging module is used for judging whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value or not; and the second classification module is used for classifying the second public opinion text information into the first target public opinion category database under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be greater than a preset value.
In some of these embodiments, the apparatus further comprises: the second input module is used for inputting the second public opinion text information into a public opinion text information classification model which is completely trained under the condition that the similarity between the second public opinion text information and the first public opinion text information is not larger than a preset value, so as to obtain a second target public opinion category corresponding to the second public opinion text information; and the third classification module is used for classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category.
In some embodiments, according to the second target public opinion category, the apparatus further comprises: the fourth acquisition module is used for acquiring third public opinion text information in a third website to be analyzed; the second determination module is used for determining a first distance value of a first target public opinion category database of the third public opinion text information and determining a second distance value of a second target public opinion category database of the fourth public opinion text information based on a K-nearest neighbor classification algorithm; and the fourth classification module is used for classifying the third public opinion text information into the first target public opinion category database under the condition that the first distance value is smaller than the second distance value.
In some of these embodiments, the apparatus further comprises: and the fifth classification module is used for classifying the third public opinion text information into the second target public opinion category database under the condition that the first distance value is greater than the second distance value.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
step S1, acquiring first public opinion text information in a first website to be analyzed; .
And step S2, inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information.
Step S3, classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the method for classifying network public sentiments in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement the method. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any one of the above-described methods for classifying internet public opinions.
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A classification method of network public sentiment is characterized by comprising the following steps:
acquiring first public opinion text information in a first website to be analyzed;
inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm, and the completely trained public opinion text information classification model is trained to determine the public opinion category of the public opinion text information according to the public opinion text information;
and classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
2. The method for classifying internet public opinions as claimed in claim 1, wherein before the first public opinion text information is inputted into a well-trained public opinion text information classification model and a public opinion category corresponding to the first public opinion text information is obtained, the method further comprises:
preprocessing the first public opinion text information to obtain the preprocessed first public opinion text information, wherein the preprocessing comprises: information filtering, information complementing and information duplication removing.
3. The method for classifying internet public sentiment according to claim 1, wherein after classifying the first public sentiment text information into a first target public sentiment category database corresponding to the first target public sentiment category according to the first target public sentiment category, the method further comprises:
acquiring second public opinion text information in a second website to be analyzed;
judging whether the similarity between the second public opinion text information and the first public opinion text information is greater than a preset value;
and under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be greater than a preset value, classifying the second public opinion text information into the first target public opinion category database.
4. The method for classifying internet public opinions according to claim 3, wherein after determining whether the similarity between the second public opinion text message and the first public opinion text message is greater than a preset value, the method further comprises:
under the condition that the similarity between the second public opinion text information and the first public opinion text information is judged to be not more than a preset value, inputting the second public opinion text information into a public opinion text information classification model which is completely trained to obtain a second target public opinion category corresponding to the second public opinion text information;
and classifying the second public opinion text information into a second target public opinion category database corresponding to the second target public opinion category according to the second target public opinion category.
5. The method for classifying internet public sentiment according to claim 4, wherein after classifying the second public sentiment text information into a second target public sentiment category database corresponding to the second target public sentiment category according to the second target public sentiment category, the method further comprises:
acquiring third public opinion text information in a third website to be analyzed;
determining a first distance value of a first target public opinion category database of third public opinion text information and a second distance value of a second target public opinion category database of fourth public opinion text information based on a K-nearest neighbor classification algorithm;
and under the condition that the first distance value is smaller than the second distance value, classifying the third public opinion text information into the first target public opinion category database.
6. The method for classifying internet public sentiment according to claim 5, wherein the third public sentiment text information is classified into the second target public sentiment category database in case that the first distance value is greater than the second distance value.
7. The method for classifying internet public opinions as claimed in claim 1, wherein the training process for training the complete public opinion text information classification model comprises:
acquiring a plurality of public opinion text information samples and an initial public opinion text information classification model;
selecting a preset threshold value of public opinion text information samples from a plurality of public opinion text information samples as an initial clustering center;
determining the minimum distance between a plurality of public opinion text information samples and the initial clustering centers with the preset threshold value;
and training the initial public opinion text information classification model according to the minimum distance until convergence, so as to obtain a completely trained public opinion text information classification model.
8. The utility model provides a sorter of online public opinion which characterized in that includes:
the first acquisition module is used for acquiring first public opinion text information in a first website to be analyzed;
the first input module is used for inputting the first public opinion text information into a completely trained public opinion text information classification model to obtain a first target public opinion category corresponding to the first public opinion text information, wherein the completely trained public opinion text information classification model is trained based on a k-means clustering algorithm and is trained to determine the public opinion category of the public opinion text information according to the public opinion text information;
and the first classification module is used for classifying the first public opinion text information into a first target public opinion category database corresponding to the first target public opinion category according to the first target public opinion category.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for classifying network public opinions according to any one of claims 1 to 7.
10. A storage medium, wherein a computer program is stored in the storage medium, and the computer program is configured to execute the method for classifying internet public opinions according to any one of claims 1 to 7 when the computer program runs.
CN202110061679.5A 2021-01-18 2021-01-18 Internet public opinion classification method, device, electronic device and storage medium Pending CN112749280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110061679.5A CN112749280A (en) 2021-01-18 2021-01-18 Internet public opinion classification method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110061679.5A CN112749280A (en) 2021-01-18 2021-01-18 Internet public opinion classification method, device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN112749280A true CN112749280A (en) 2021-05-04

Family

ID=75652223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110061679.5A Pending CN112749280A (en) 2021-01-18 2021-01-18 Internet public opinion classification method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112749280A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239290A (en) * 2021-06-10 2021-08-10 杭州安恒信息技术股份有限公司 Data analysis method and device for public opinion monitoring and electronic device
CN113609298A (en) * 2021-08-23 2021-11-05 南京擎盾信息科技有限公司 Data processing method and device for court public opinion corpus extraction
CN113762343A (en) * 2021-08-04 2021-12-07 德邦证券股份有限公司 Method, device and storage medium for processing public opinion information and training classification model
CN115796145A (en) * 2022-11-16 2023-03-14 珠海横琴指数动力科技有限公司 Method, system, server and readable storage medium for acquiring webpage text

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630915A (en) * 2015-12-21 2016-06-01 山东大学 Method and device for classifying and storing pictures in mobile terminals
CN105975573A (en) * 2016-05-04 2016-09-28 北京广利核系统工程有限公司 KNN-based text classification method
CN111339385A (en) * 2020-02-26 2020-06-26 山东爱城市网信息技术有限公司 CART-based public opinion type identification method and system, storage medium and electronic equipment
CN111861596A (en) * 2019-04-04 2020-10-30 北京京东尚科信息技术有限公司 Text classification method and device
CN112115331A (en) * 2020-09-21 2020-12-22 朱彤 Capital market public opinion monitoring method based on distributed web crawler and NLP

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630915A (en) * 2015-12-21 2016-06-01 山东大学 Method and device for classifying and storing pictures in mobile terminals
CN105975573A (en) * 2016-05-04 2016-09-28 北京广利核系统工程有限公司 KNN-based text classification method
CN111861596A (en) * 2019-04-04 2020-10-30 北京京东尚科信息技术有限公司 Text classification method and device
CN111339385A (en) * 2020-02-26 2020-06-26 山东爱城市网信息技术有限公司 CART-based public opinion type identification method and system, storage medium and electronic equipment
CN112115331A (en) * 2020-09-21 2020-12-22 朱彤 Capital market public opinion monitoring method based on distributed web crawler and NLP

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239290A (en) * 2021-06-10 2021-08-10 杭州安恒信息技术股份有限公司 Data analysis method and device for public opinion monitoring and electronic device
CN113762343A (en) * 2021-08-04 2021-12-07 德邦证券股份有限公司 Method, device and storage medium for processing public opinion information and training classification model
CN113762343B (en) * 2021-08-04 2024-03-15 德邦证券股份有限公司 Method, device and storage medium for processing public opinion information and training classification model
CN113609298A (en) * 2021-08-23 2021-11-05 南京擎盾信息科技有限公司 Data processing method and device for court public opinion corpus extraction
CN115796145A (en) * 2022-11-16 2023-03-14 珠海横琴指数动力科技有限公司 Method, system, server and readable storage medium for acquiring webpage text
CN115796145B (en) * 2022-11-16 2023-09-08 珠海横琴指数动力科技有限公司 Webpage text acquisition method, system, server and readable storage medium

Similar Documents

Publication Publication Date Title
CN112749280A (en) Internet public opinion classification method, device, electronic device and storage medium
CN107835496B (en) Spam short message identification method and device and server
CN108513175B (en) Bullet screen information processing method and system
US20170372169A1 (en) Method and apparatus for recognizing image content
CN109325148A (en) The method and apparatus for generating information
CN110149266B (en) Junk mail identification method and device
CN108874777A (en) A kind of method and device of text anti-spam
CN109308319B (en) Text classification method, text classification device and computer readable storage medium
EP2741473A1 (en) Human-machine interaction data processing method and apparatus
CN110413875A (en) A kind of method and relevant apparatus of text information push
CN111507350B (en) Text recognition method and device
CN110287328A (en) A kind of file classification method, device, equipment and computer readable storage medium
CN107943792B (en) Statement analysis method and device, terminal device and storage medium
Singh et al. Dock: Detecting objects by transferring common-sense knowledge
CN107169106A (en) Video retrieval method, device, storage medium and processor
CN110401545B (en) Chat group creation method, chat group creation device, computer equipment and storage medium
CN108319672B (en) Mobile terminal bad information filtering method and system based on cloud computing
CN113590850A (en) Multimedia data searching method, device, equipment and storage medium
CN108416032A (en) A kind of file classification method, device and storage medium
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
CN113780007A (en) Corpus screening method, intention recognition model optimization method, equipment and storage medium
CN112966072A (en) Case prediction method and device, electronic device and storage medium
CN116226785A (en) Target object recognition method, multi-mode recognition model training method and device
CN113010705B (en) Label prediction method, device, equipment and storage medium
CN112115994A (en) Training method and device of image recognition model, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210504

RJ01 Rejection of invention patent application after publication